Insights on Oracle & Tech: OAC

Showing posts with label OAC. Show all posts

Friday, February 18, 2022

OAC―Editions of Oracle Analytics Cloud

Figure 1. Provisioning an OAC instance

Subjects of Learning OAC

Describe the editions of Oracle Analytics Cloud
Describe the solutions applicable for each OAC edition
Identify the pre-requisites for OAC
Explain the concept of a compute shape

Video 1. Create a Service with Oracle Analytics Cloud (YouTube link)

Oracle Analytics Cloud Products

Oracle Analytics Cloud offers you three product options:^[4]

Differences Between Products

The main difference between Oracle Analytics Cloud, Oracle Analytics Cloud Subscription, and Oracle Analytics Cloud - Classic is the way you deploy and manage your services on Oracle Cloud.

Editions
Service Management
Infrastructure

Editions

Several editions are currently available: Professional and Enterprise. The features available with each edition depend on the product option and regions accessible to you. Read [4] for details especially for the availability of different products based on dates and regions.

Service Management

Service Management	Oracle Analytics Cloud	Oracle Analytics CloudSubscription	Oracle Analytics Cloud - Classic
Managed by You (Oracle User)			You manage the service lifecycle and configuration, and have SSH access to the compute node VM.
Oracle–Managed	Oracle provides you with lifecycle management and configuration. You can log service requests to Oracle Cloud support to request service updates.
Customer Responsibility
Manage users and roles
Create and size service
Create database cloud service
Administer database cloud service
Back up and restore services	Oracle schedules and manages backups	Oracle schedules and manages backups
Patch services	Oracle schedules and applies patches	Oracle schedules and applies patches
Patch operating system	Oracle schedules and applies patches	Oracle schedules and applies patches
Start and stop services
Pause and resume services
Monitor services	Oracle has direct access to diagnostic logs for troubleshooting issues	Oracle has direct access to diagnostic logs for troubleshooting issues

Infrastructure

Infrastructure	Oracle Analytics Cloud	Oracle Analytics CloudSubscription	Oracle Analytics Cloud - Classic
Oracle Cloud Infrastructure (Gen 2)
Oracle Cloud Infrastructure (Gen 1)
Oracle Cloud Infrastructure Classic
Oracle Cloud Infrastructure Identity and Access Management- Identity Domains	Available on Oracle Cloud Infrastructure (Gen 2) to new customers in some Oracle Cloud regions.
Oracle Identity Cloud Service
Load Balancer	An Oracle-managed load balancer is automatically created and configured for your service.	An Oracle-managed load balancer is automatically created and configured for your service.	When you enable Oracle Identity Cloud Service as the identity provider, an Oracle-managed load balancer is created and configured automatically for your service.
Cloud Storage Required	Uses Oracle Cloud Infrastructure Object Storage— A storage bucket is automatically created for your service.	Uses Oracle Cloud Infrastructure Object Storage— A storage bucket is automatically created for your service.	Uses Oracle Cloud Infrastructure Object Storage Classic — You can create the object storage container either before or while you set up your service.
Oracle Database Cloud Service Required			You must set up a database service for Oracle Analytics Cloud - Classic schemas and organize a back up schedule.
Size Deployment by Shape	Various Oracle Compute Unit (OCPU) sizing options.	Various Oracle Compute Unit (OCPU) sizing options.	Standard and high memory shapes. The list of available shapes may vary by region.
Size Deployment by Number of Users	Only on Oracle Cloud Infrastructure (Gen 2).
Scale Up and Scale Down
Availability Domains	Each region has multiple isolated availability domains, with separate power and cooling. The availability domains within a region are interconnected using a low-latency network. When you create a service, you can select the region where you want to deploy the service and Oracle automatically selects an availability domain.	Each region has multiple isolated availability domains, with separate power and cooling. The availability domains within a region are interconnected using a low-latency network. When you create a service, you can select the region where you want to deploy the service and Oracle automatically selects an availability domain.

Oracle Analytics Cloud - Professional Edition

With Professional Edition, you can:

Take control of your data
Create processes for business analytics application and data collection
Discover insights on the data that you provide
Prepare data through interactive data flows
Explore data through grammar-based visualization
Coordinate business analytics within your department or organization
Use the Oracle Analytics Day by Day mobile application

Oracle Analytics Cloud - Enterprise Edition

Enterprise Edition offers all the features in Professional Edition and in addition, you can:

Build data models, reports, and analytic dashboards in an enterprise business intelligence environment
Design and publish pixel-perfect reports from your enterprise data
Migrate content from your existing on-premises environment
Perform a sensitivity analysis to test various data scenarios
Use the Oracle Analytics Day by Day mobile application
Maintain live and optimized connectivity to on-premises data warehouses

References

Sunday, February 17, 2019

OAC―Knowing the Dimensional Modelling Basics (1/2)

Video 1. Dimensional Modeling – Declaring Dimensions (YouTube link)

Operational Processing vs Data Warehousing

One of the most important assets of any organization is its information. This asset is almost always used for two purposes:^[1]

	Operational Processing	Analytical Decision Making
Flavor	Transactional	Analytical
Main Data Flow	The operational systems are where you put data in	The Data Warehousing and Business Intelligence (DW/BI) systems are where you get the data out
Optimization	Optimized to process transactions quickly	Optimized for high-performance queries
# of Transactions in Processing	Almost always deal with one transaction record at a time	Often require that many transactions be searched and compressed into an answer set
History Preservation	Typically do not maintain history, but rather update data to reflect the most current state	Typically demand that historical context be preserved to accurately evaluate the organization's performance over time

Dimensional Modeling (DW/BI)

Dimensional modeling is one of the methods of data modeling, that is the preferred technique for presenting analytic data. It helps us store the data in such a way that it is relatively easy to retrieve the data from the data once the data is stored in database.

This is the reason why dimensional modeling is used mostly in data warehouses built for reporting. On the other side, dimensional model is not a good solution if your primary purpose of your data modeling is to reduce storage space requirement, reduce redundancy, speed-up loading time etc.^[1]

Figure 1. Star Schema (left) vs OLAP Cube (right)

3NF Model vs Dimensional Model

Although dimensional models are often instantiated in relational database management systems (RDMS), they are quite different from third normal form (3NF) models which seek to remove data redundancies:

3NF Model (or Normalized Model)

Divides data into many discrete entities, each of which becomes a relational table
Sometimes are referred as entity-relationship (ER) models

Entity-relationship diagrams (ER diagrams or ERDs) are drawings that communicate the relationships between tables

Designed to reduce the duplication of data and ensure referential integrity
Designed to improve database processing while minimizing storage costs
Useful in operational processing because an update or insert transaction touches the database in only one place

However, are too complicated for BI queries.

Dimensional Model (Star Schemas and OLAP Cubes)

Both stars and cubes have a common logical design with recognizable dimensions; however, the physical implementation differs (see Figure 1):

Star Schemas

Referred to as star schemas in RDBS because of their resemblance to a star-like structure in RDMS implementation

OLAP Cubes

Referred to as online analytical processing (OLAP) cubes in multidimensional database platform
Cubes can deliver superior query performance because of the precalculations, indexing strategies, and other optimizations
The downside is that you pay a load performance price for these capabilities, especially with large data sets

Contains the same information as a normalized model, but packages the data in a format that delivers user understandability, query performance, and resilience to change

Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key difference between 3NF and dimensional models is the degree of normalization.

Although the capabilities of OLAP technology are continuously improving, we generally recommend that detailed, atomic information be loaded into a star schema; optional OLAP cubes are then populated from the star schema.^[1]

Dimensional Modeling Case Study

Consider the business scenario for a fast food chain:^[2]

The business objective is to create a data model that can store and report number of burgers and fries sold from a specific McDonalds outlet per day.

Below are the steps used for dimensional modeling:

Identify the dimensions

Dimensions

Describe the “who, what, where, when, how, and why” associated with the business process measurement event (e.g. a sales transaction).

In the above scenario, we have 3 dimensions - "food" (e.g. burgers and fries), "store" and "day"

Separate dimension tables are created for separate dimensions

The dimension tables contain the textual context (normally with set of descriptive nouns that characterize the business process) associated with a measurement event.

Identify the measurement events (or facts)

Measurement Events

A measurement event in the physical world has a one-to-one relationship to a single row in the corresponding fact table

In the above scenario, we have 1 measurement event - "quantity"

A fact table is created for storing measures and foreign keys to the dimension tables

The fact table stores the "number" of food sold in "Quantity" column against a given store, food and day columns.
These store/food/day columns are basically foreign key columns of the primary keys in respective dimension tables.

Identify the attributes or properties of dimensions

Attributes (or Properties)

Each dimension might have number of different properties, but for a given context, not all of them are relevant for business

Knowing the properties let us decide what columns are required to be created in each dimension table.
In the above scenario, we could have

Food: name (burgers or fries)
Store: name, location, etc
Day: date

Identify the granularity of the measures

All the measurement rows in a fact table must be at the same grain (i.e., day or month).
Having the discipline to create fact tables with a single level of detail ensures that measurements aren't inappropriately double-counted.

History Preservation (Optional)

Identify which dimensions are slowly changing (or fast changing or unchanging) is the last and final step of modeling (see video 1)
There are 8 different dimension types, but only 3 are commonly used:^[3]

Type 0 - Fixed, non changing attribute
Type 1 - Changing attribute, no history kept
Type 2 - Most complex, keeps historical changes

Figure 2. Sample rows from a dimension table with denormalized hierarchies

Summary

Dimensional Model (cf. Normalized Model )

Dimensional schema is simpler and symmetric

Business users benefit from the simplicity because the data is easier to understand and navigate
Database optimizers process these simple schemas with fewer joins more efficiently
Every dimension is equivalent; all dimensions are symmetrically-equal entry points into the fact table.

Dimensional models are gracefully extensible to accommodate change

With dimensional models, you can add completely new dimensions to the schema as long as a single value of that dimension is defined for each existing fact row.

Fact Tables

Fact tables tend to be deep in terms of the number of rows, but narrow in terms of the number of columns
The most useful facts are numeric and additive, such as dollar sales amount.

Additivity is crucial because BI applications rarely retrieve a single fact table row.

However, you will see that facts are sometimes semi-additive (e.g., account balances) or even non-additive (e.g., unit prices).

Facts are often described as continuously valued
Fact tables usually make up 90 percent or more of the total space consumed by a dimensional model.
All fact tables have two or more foreign keys that connect to the dimension tables' primary keys.
Fact tables (or bridge table) express many-to-many relationships.

Dimension Tables

Dimension tables tend to be shallow in terms of the number of rows, but wide in terms of the number of columns
Each dimension is defined by a single primary key (surrogate key or natural key) , which serves as the basis for referential integrity with any given fact table to which it is joined.
Robust dimension attributes deliver robust analytic slicing-and-dicing capabilities.

In many ways, the data warehouse is only as good as the dimension attributes; the analytic power of the DW/BI environment is directly proportional to the quality and depth of the dimension attributes.
Dimension attributes serve as the primary source of query constraints, groupings, and report labels.

You should strive to minimize the use of codes or cryptic abbreviations in dimension tables by replacing them with more verbose textual attributes.

Dimension tables often represent hierarchical relationships (See Figure 2)

For example, products roll up into brands and then into categories.
For each row in the product dimension, you should store the associated brand and category description.
The hierarchical descriptive information is stored redundantly in the spirit of ease of use and query performance.
You should resist the habitual urge to normalize data (i.e., snowflaking)

You should almost always trade off dimension table space for simplicity and accessibility.
Because dimension tables typically are geometrically smaller than fact tables, improving storage efficiency by normalizing or snowflaking has virtually no impact on the overall database size.

Fact or Dimension Attribute

When triaging operational source data, it is sometimes unclear whether a numeric data element is a fact or dimension attribute. It is

A fact if

The column is a measurement that takes on lots of values and participates in calculations

A dimension attribute if

The column is a discretely valued description that is more or less constant and participates in constraints and row labels

Note:

Continuously valued numeric observations are almost always facts; discrete numeric observations drawn from a small list are almost always dimension attributes.

References

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
What is dimensional modelling?
Dimensional Modeling – Declaring Dimensions (Youtube)
Learn Modern Data Visualization with Oracle Analytics
Click here for more A-Team Oracle Analytics (OAC) Blogs.

Wednesday, February 13, 2019

OAC―Knowing Machine Learning Basics

Video 1. Machine Learning with Oracle Analytics Cloud (YouTube link)

Tom Mitchell:

Machine Learning is the study of algorithms that learn from experience E with respect to some class of tasks T and performance measure P, such that the algorithms’ performance at tasks in T, as measured by P, improves with experience E.

The most important part of the definition above is the experience E or the data the algorithm (a.k.a. ML model) trains on. Almost always it is the data that differentiates a great ML model from a good one.
The new Machine Learning (ML) capabilities in Oracle Analytics Cloud (OAC) are built-in to the reporting platform and are accessible through either a browser or a desktop application. You can use it to make predictions and intelligent suggestions from your ML models and data.

In this article, the introduction of ML in OAC will be based on video 1―Machine Learning with Oracle Analytics Cloud and below topics are covered:

Use the 'Explain' functionality
Create and train a machine learning model
Analyze how effective the model is
Use an ML scenario in a project

Figure 1. Explain functionality provided on LTV_BIN attribute

Use the 'Explain' functionality

To run Explain, simply right-click on an attribute in a data set while in Data Visualization and select Explain (see Figure 1). Some serious algorithm crunching happens behind the scenes and then you get a popup of the findings summarized in graphical and narrative form.

The power of the Explain feature is that it informs you of insights that you haven’t been aware of. This is where data discovery is truly independent of user bias and input. For example, when applying Explain to “Customer Segment”, ML can decide:

What factors make more sense to highlight in relation to Customer Segment
What story your data can tell
What different scenarios and combination of factors to look at

However, the effectiveness of doing “Explain” is going to be as efficient as the data set is well defined and the platform has enough processing power. In other words, we need to be aware of what data set we are exploring and make sure it has the right facts before starting to discover.

Figure 2. Data flow step options including Train Multi-Classifier

Figure 3. Optimizer Options including Adam Optimizer

Create a Train Model for a Data Flow

As a advanced analyst, you can use scripts (e.g. Neural Network for Classification) to train data models that you then add to other sets of data to predict trends and patterns in data.

Scripts define the interface and logic (code) for machine learning tasks. You can use a training task (classification or numeric prediction), for example, to train a model based on known (labeled) data. When the model is built, the same can be used to score unknown data (that is, unlabeled) to:

Generate a data set within a data flow, or
Provide a prediction dynamically within a visualization.

Machine learning tasks are available as individual step types (for example, Train Binary, Apply Model).

For example, you could train a model on a set of data that includes customer information and then apply this model to a set of new customer data that doesn't include Life-Time Value (LTV) information. Because the model is based on specific factors and is 97% accurate, it can accurately predict how many and which new customers in the data set most likely have a high customer lifetime value. In the below demonstration, Train Multi-Classifier is used to classify customers into 4 LTV bins (i.e., Low, Medium, High, Very High):

In the Data tab, select a data set that you want to use in the data flow.
In the Data Flows tab, click Create and select Data Flow.
Select the data set (e.g. Customer Insurance LTV - Local) that you want to use to create your train model, and click Add.
In the data flow, click the Plus (+) symbol.
This displays all available data flow step options (see Figure 2), including train model types (for example, Train Numeric Predictions, Train Multi-Classifier).^[22]
Click the train model type that you want to apply to the data set.

For example, Train Multi-Classifier is a multiclass train model that helps predict which LTV_BIN (i.e. Low, Medium,, High, Very High) a new customer will be classified into.

Refine the field details for the model as required:

If you want to change the script, then click Model Training Script.
Click Target to select a Data Set column that you want to apply the train model to.

For example, you might want to model the customer data to predict a person's LTV_BIN. Consider an agent who is interested in keeping customers who have potentially High LTV.

Update the remaining fields with values that are appropriate for the script you selected (see Figure 3).

Click Save, enter a name and description and click OK to save the data flow with your choice of parameter values for the current train model script.
Click Save Model, enter a name (e.g. Predict LTV Bin - NN) and description, and click Save to save the model.

You can now run the model script like any other data flow.

Figure 4. Machine Learning view with Scripts and Models tabs

Figure 5. Confusion Matrix indicates actual values against predicted values

Analyze How Effective the Train Model Is

Once you’ve created a train model, you can explore information about it and how it interprets data. You can use that information to modify the model.

When you run a train model data flow, it produces outputs which you can interpret, so that you can refine the model.

Click the Navigator icon and select Machine Learning.

Machine Learning displays the Scripts and Models tabs (see Figure 4).

To view the train model data flow outputs, display the Models tab.

This displays all models created.

Click the menu icon for a model (e.g. Predict LTV Bin - NN) and select the Inspect option.

This displays four tabs: General, Quality, Permissions and Related.

(Optional) Click General.

This page shows information about the model including:

Predicts - The name of whatever the model is trying to predict (e.g. LTV_BIN).
Trained On - The name of the data set (e.g. Customer Insurance LTV - Local) that you're using to train the model.
Script - The name of the script (e.g. Neural Network for Classification) used in the model.
Class - The class of script (for example, Multiclass Classification).

(Optional) Click Quality.

A portion (configurable) of the training data set is kept aside for validation purposes. When the model is built, it’s applied to the validation data set with known labels. A different set of metrics such as Accuracy, Precision, and Recall are calculated based on Actual (Label) and Predicted Values. Information is also shown as a matrix, that you can use to provide quick simple summaries of what is found during validation.
The Quality page displays:

A list of standard metrics, where the metrics displayed are related to the model selected. Each metric helps you determine how good the model is in terms of its prediction accuracy for the selected Data Set column to which you apply the train model.
The matrix shows the state of the data used to make the predictions.

The matrix indicates actual values against predicted values to help you understand if the predicted values are close to the actual values (see Figure 5).

(Optional) Click Related.

Related tab captures data sets emitted by the machine learning scripts when run to build models. The data sets capture specific information related to the script logic (e.g., multiclass classification), so that advanced users (data scientists) can get more insights into the model built.
This page shows the training data including:

Training Data - The data set being used to train the model.
Generated Data - The data sets created by the script that you use for the training model. You may see different data sets if you select another script to train a model.

Score a Model

You can apply a train model within a data flow to generate a data set.

In the Data tab, select a data set that you want to use in the data flow.

This can be any data set containing data that you want to apply your model to.

In the Data Flows tab, click Create and select Data Flow to display the Add Data Set pane.
Select the data set (e.g. Customer Insurance New) to which you want to apply the model, and click Add.

Select a data set like the one used to create the model.

In the data flow, click the Plus (+) symbol.
Click Apply Model from the available options.
Select a model (e.g. Predict LTV Bin - NN) from the list of available models and click OK to confirm.
Select the Output columns that you want generated by this data flow, and update Column Name fields (e.g. LTV_BIN and PredictionConfidence) if required.

The output columns displayed in the Apply Model pane are created as a data set when the data flow runs.
The output columns are relevant to the model.

In the data flow, click the Plus (+) symbol and select Save Data to add a Save Data step.
Click Save, enter a name (e.g. Customer w LTV BIN) and description and click OK to save the data flow with the selected model and output.

You can now run the data flow to create the appropriate output data set columns using the selected model.

A data set that you create using a scoring data flow can be used within a visualization in the same way as any other data set.

Figure 6. Right-click the data set (e.g. Customer Insurance New) and select Create Scenario

Figure 7. Create Scenario - Select Model dialog

Add Scenarios to a Project

You can apply scenarios within a project by selecting from a list of available machine learning models, joining the model to the existing data sets within a project, then using the resulting model columns within a visualization. A scenario enables you to add a set of virtual model output columns to create a blended report, which isn't unlike adding data directly to a project to create blended visualization. You can use the predicted values for the subset of the data of interest within a specific visualization. The virtual data set columns don’t physically exist, they represent the model outputs and their values are dynamically generated when used in a visualization.

Create or open the Data Visualization project in which you want to apply a scenario.

Confirm that you’re working in the Visualize canvas.

To add a scenario, do one of the following:

Click Add, and select Create Scenario.
In the Data Elements pane, right-click the data set (e.g. Customer Insurance New) and select Create Scenario (see Figure 6).

In the Create Scenario - Select Model dialog, select the name of the model (e.g. Predict LTV Bin - NN) and click OK (see Figure 7).
In the Map Your Data to the Model dialog, specify various options:

In a project with multiple data set, click Data Set to select a data set that you want to map to the model.
In the table, click Select Column to match a column to a model input.

Each model has inputs (that is, data elements) that must match corresponding columns from the data set. If the data type (for example, column name) of a model input matches a column, then the input and column are automatically matched. If a model input has a data type that doesn't match any column, you must manually specify the appropriate data element.
Click Show all inputs to display the model inputs and the data elements with which they match. Alternatively, click Show unmatched inputs to display the model inputs that aren’t matched with a column.

Click OK to add the resulting model columns to the Data Elements pane. You can now use the model columns with the data set columns.
Drag and drop one or more data set and model columns from the Data Elements pane to drop targets in the Visualize canvas. You can also double-click the columns to add them to the canvas.

You can add one or more scenarios to the same or different data sets. In the Data Elements pane right-click the model, and select one of the following options:

Edit Scenario - Open the Map Your Data to the Model dialog to edit a scenario.
Reload Data - Update the model columns after you edit the scenario.
Remove from Project - Open the Remove Scenario dialog to remove a scenario.

Video 2. Use Explain to Discover Data Insights in Oracle Analytics (YouTube link)

Video 3. OAC Workshop : Basics of Training & Applying Predictive Models With Oracle DV (YouTube link)

Video 4. Oracle Analytics Cloud: Augmented Analytics with AI and ML (YouTube link)

References

Machine Learning with Oracle Analytics Cloud (YouTube)
Use Machine Learning to Analyze Data (OAC)
3 Easy Ways to do ML with Oracle Analytics Cloud
Oracle DV Workshop - Basics of Training & Applying Predictive Models With Oracle DV (Youtube)
Create Data Flows in Oracle Data Visualization V5 (YouTube)
How to Populate Quality Tab in ML Model Inspect page in Oracle Analytics Cloud
Machine Learning Basics
Machine Learning with Oracle Big Data Cloud (YouTube)
Data Visualization (Forum)
Oracle Data Visualization Desktop (Documentation)
Oracle Analytics Library
Visualizing Data and Building Reports in Oracle Analytics Cloud
Using Oracle Data Visualization Cloud Service
Oracle® Fusion MiddlewareUser's Guide for Oracle Data Visualization (PDF)
What's New for Oracle Data Visualization Desktop
Machine Learning (Oracle A-Team Chronicles)
Oracle Underground BI & Dataviz (Blogger)
Data Science for Business (Safari)
Learn Modern Data Visualization with Oracle Analytics
Click here for more A-Team Oracle Analytics (OAC) Blogs.
How Can I Use Oracle Machine Learning Models in Oracle Analytics?
How Do I Choose a Predictive Model Algorithm?

Cross Column

Friday, February 18, 2022

OAC―Editions of Oracle Analytics Cloud

Subjects of Learning OAC

Oracle Analytics Cloud Products

Differences Between Products

Editions

Service Management

Infrastructure

Oracle Analytics Cloud - Professional Edition

Oracle Analytics Cloud - Enterprise Edition

References

Sunday, February 17, 2019

OAC―Knowing the Dimensional Modelling Basics (1/2)

Operational Processing vs Data Warehousing

Dimensional Modeling (DW/BI)

3NF Model vs Dimensional Model

Dimensional Modeling Case Study

Summary

References

Wednesday, February 13, 2019

OAC―Knowing Machine Learning Basics

Use the 'Explain' functionality

Create a Train Model for a Data Flow

Analyze How Effective the Train Model Is

Score a Model

Add Scenarios to a Project

References