Wednesday, February 13, 2019

OAC―Knowing Machine Learning Basics

Video 1.  Machine Learning with Oracle Analytics Cloud (YouTube link)

Tom Mitchell:
Machine Learning is the study of algorithms that learn from experience E with respect to some class of tasks T and performance measure P, such that the algorithms’ performance at tasks in T, as measured by P, improves with experience E.
The most important part of the definition above is the experience E or the data the algorithm (a.k.a. ML model) trains on. Almost always it is the data that differentiates a great ML model from a good one.
The new Machine Learning (ML) capabilities in Oracle Analytics Cloud (OAC) are built-in to the reporting platform and are accessible through either a browser or a desktop application. You can use it to make predictions and intelligent suggestions from your ML models and data.

In this article, the introduction of ML in OAC will be based on video 1Machine Learning with Oracle Analytics Cloud and below topics are covered:

Figure 1. Explain functionality provided on LTV_BIN attribute

Use the 'Explain' functionality


To run Explain, simply right-click on an attribute in a data set while in Data Visualization and select Explain (see Figure 1). Some serious algorithm crunching happens behind the scenes and then you get a popup of the findings summarized in graphical and narrative form.

The power of the Explain feature is that it informs you of insights that you haven’t been aware of. This is where data discovery is truly independent of user bias and input. For example, when applying Explain to “Customer Segment”, ML can decide:
  • What factors make more sense to highlight in relation to Customer Segment
  • What story your data can tell
  • What different scenarios and combination of factors to look at
However, the effectiveness of doing “Explain” is going to be as efficient as the data set is well defined and the platform has enough processing power.  In other words, we need to be aware of what data set we are exploring and make sure it has the right facts before starting to discover.


Figure 2. Data flow step options including Train Multi-Classifier


Figure 3. Optimizer Options including Adam Optimizer

Create a Train Model for a Data Flow


As a advanced analyst, you can use scripts (e.g. Neural Network for Classification) to train data models that you then add to other sets of data to predict trends and patterns in data.

Scripts define the interface and logic (code) for machine learning tasks. You can use a training task (classification or numeric prediction), for example, to train a model based on known (labeled) data. When the model is built, the same can be used to score unknown data (that is, unlabeled) to:
  • Generate a data set within a data flow, or 
  • Provide a prediction dynamically within a visualization. 
Machine learning tasks are available as individual step types (for example, Train Binary, Apply Model).

For example, you could train a model on a set of data that includes customer information and then apply this model to a set of new customer data that doesn't include Life-Time Value (LTV) information. Because the model is based on specific factors and is 97% accurate, it can accurately predict how many and which new customers in the data set most likely have a high customer lifetime value.  In the below demonstration, Train Multi-Classifier is used to classify customers into 4 LTV bins (i.e., Low, Medium, High, Very High):

  1. In the Data tab, select a data set that you want to use in the data flow.
  2. In the Data Flows tab, click Create and select Data Flow.
  3. Select the data set (e.g. Customer Insurance LTV - Local) that you want to use to create your train model, and click Add.
  4. In the data flow, click the Plus (+) symbol.
  5. This displays all available data flow step options (see Figure 2), including train model types (for example, Train Numeric Predictions, Train Multi-Classifier).[22]
  6. Click the train model type that you want to apply to the data set.
    • For example, Train Multi-Classifier is a multiclass train model that helps predict which LTV_BIN (i.e. Low, Medium,, High, Very High) a new customer will be classified into.
  7. Refine the field details for the model as required:
    • If you want to change the script, then click Model Training Script.
    • Click Target to select a Data Set column that you want to apply the train model to.
      • For example, you might want to model the customer data to predict a person's LTV_BIN. Consider an agent who is interested in keeping customers who have potentially High LTV.
    • Update the remaining fields with values that are appropriate for the script you selected (see Figure 3).
  8. Click Save, enter a name and description and click OK to save the data flow with your choice of parameter values for the current train model script.
  9. Click Save Model, enter a name (e.g. Predict LTV Bin - NN) and description, and click Save to save the model.
    • You can now run the model script like any other data flow.

Figure 4. Machine Learning view with Scripts and Models tabs


Figure 5.  Confusion Matrix indicates actual values against predicted values

Analyze How Effective the Train Model Is


Once you’ve created a train model, you can explore information about it and how it interprets data. You can use that information to modify the model.

When you run a train model data flow, it produces outputs which you can interpret, so that you can refine the model.
  1. Click the Navigator icon and select Machine Learning.
    • Machine Learning displays the Scripts and Models tabs (see Figure 4).
  2. To view the train model data flow outputs, display the Models tab.
    • This displays all models created.
  3. Click the menu icon for a model (e.g. Predict LTV Bin - NN) and select the Inspect option.
    • This displays four tabs: General, Quality, Permissions and Related.
  4. (Optional) Click General.
    • This page shows information about the model including:
      • Predicts - The name of whatever the model is trying to predict (e.g. LTV_BIN).
      • Trained On - The name of the data set (e.g. Customer Insurance LTV - Local) that you're using to train the model.
      • Script - The name of the script (e.g. Neural Network for Classification) used in the model.
      • Class - The class of script (for example, Multiclass Classification).
  5. (Optional) Click Quality.
    • A portion (configurable) of the training data set is kept aside for validation purposes. When the model is built, it’s applied to the validation data set with known labels. A different set of metrics such as Accuracy, Precision, and Recall are calculated based on Actual (Label) and Predicted Values. Information is also shown as a matrix, that you can use to provide quick simple summaries of what is found during validation. 
    • The Quality page displays:
      • A list of standard metrics, where the metrics displayed are related to the model selected. Each metric helps you determine how good the model is in terms of its prediction accuracy for the selected Data Set column to which you apply the train model.
      • The matrix shows the state of the data used to make the predictions.
        • The matrix indicates actual values against predicted values to help you understand if the predicted values are close to the actual values (see Figure 5).
  6. (Optional) Click Related.
    • Related tab captures data sets emitted by the machine learning scripts when run to build models. The data sets capture specific information related to the script logic (e.g., multiclass classification), so that advanced users (data scientists) can get more insights into the model built.
    • This page shows the training data including:
      • Training Data - The data set being used to train the model.
      • Generated Data - The data sets created by the script that you use for the training model. You may see different data sets if you select another script to train a model.

Score a Model


You can apply a train model within a data flow to generate a data set.
  1. In the Data tab, select a data set that you want to use in the data flow.
    • This can be any data set containing data that you want to apply your model to.
  2. In the Data Flows tab, click Create and select Data Flow to display the Add Data Set pane.
  3. Select the data set (e.g. Customer Insurance New) to which you want to apply the model, and click Add.
    • Select a data set like the one used to create the model.
  4. In the data flow, click the Plus (+) symbol.
  5. Click Apply Model from the available options.
  6. Select a model (e.g. Predict LTV Bin - NN) from the list of available models and click OK to confirm. 
  7. Select the Output columns that you want generated by this data flow, and update Column Name fields (e.g. LTV_BIN and PredictionConfidence) if required.
    • The output columns displayed in the Apply Model pane are created as a data set when the data flow runs. 
    • The output columns are relevant to the model. 
  8. In the data flow, click the Plus (+) symbol and select Save Data to add a Save Data step. 
  9. Click Save, enter a name (e.g. Customer w LTV BIN) and description and click OK to save the data flow with the selected model and output.
    • You can now run the data flow to create the appropriate output data set columns using the selected model.
A data set that you create using a scoring data flow can be used within a visualization in the same way as any other data set.

Figure 6.  Right-click the data set (e.g. Customer Insurance New) and select Create Scenario


Figure 7.  Create Scenario - Select Model dialog

Add Scenarios to a Project


You can apply scenarios within a project by selecting from a list of available machine learning models, joining the model to the existing data sets within a project, then using the resulting model columns within a visualization. A scenario enables you to add a set of virtual model output columns to create a blended report, which isn't unlike adding data directly to a project to create blended visualization. You can use the predicted values for the subset of the data of interest within a specific visualization. The virtual data set columns don’t physically exist, they represent the model outputs and their values are dynamically generated when used in a visualization.
  1. Create or open the Data Visualization project in which you want to apply a scenario.
    • Confirm that you’re working in the Visualize canvas.
  2. To add a scenario, do one of the following:
    • Click Add, and select Create Scenario.
    • In the Data Elements pane, right-click the data set (e.g. Customer Insurance New) and select Create Scenario (see Figure 6).
  3. In the Create Scenario - Select Model dialog, select the name of the model (e.g. Predict LTV Bin - NN) and click OK (see Figure 7).
  4. In the Map Your Data to the Model dialog, specify various options:
    • In a project with multiple data set, click Data Set to select a data set that you want to map to the model.
    • In the table, click Select Column to match a column to a model input.
      • Each model has inputs (that is, data elements) that must match corresponding columns from the data set. If the data type (for example, column name) of a model input matches a column, then the input and column are automatically matched. If a model input has a data type that doesn't match any column, you must manually specify the appropriate data element.
      • Click Show all inputs to display the model inputs and the data elements with which they match. Alternatively, click Show unmatched inputs to display the model inputs that aren’t matched with a column.
  5. Click OK to add the resulting model columns to the Data Elements pane. You can now use the model columns with the data set columns.
  6. Drag and drop one or more data set and model columns from the Data Elements pane to drop targets in the Visualize canvas. You can also double-click the columns to add them to the canvas.
You can add one or more scenarios to the same or different data sets. In the Data Elements pane right-click the model, and select one of the following options:
  • Edit Scenario - Open the Map Your Data to the Model dialog to edit a scenario.
  • Reload Data - Update the model columns after you edit the scenario.
  • Remove from Project - Open the Remove Scenario dialog to remove a scenario.

Video 2.  Use Explain to Discover Data Insights in Oracle Analytics (YouTube link)

Video 3.  OAC Workshop : Basics of Training & Applying Predictive Models With Oracle DV (YouTube link)

Video 4.  Oracle Analytics Cloud: Augmented Analytics with AI and ML (YouTube link)


References

  1. Machine Learning with Oracle Analytics Cloud (YouTube)
  2. Use Machine Learning to Analyze Data (OAC)
  3. 3 Easy Ways to do ML with Oracle Analytics Cloud
  4. Oracle DV Workshop - Basics of Training & Applying Predictive Models With Oracle DV (Youtube)
  5. Create Data Flows in Oracle Data Visualization V5 (YouTube)
  6. How to Populate Quality Tab in ML Model Inspect page in Oracle Analytics Cloud
  7. Machine Learning Basics
  8. Machine Learning with Oracle Big Data Cloud (YouTube)
  9. Data Visualization (Forum)
  10. Oracle Data Visualization Desktop (Documentation)
  11. Oracle Analytics Library
  12. Visualizing Data and Building Reports in Oracle Analytics Cloud
  13. Using Oracle Data Visualization Cloud Service
  14. Oracle® Fusion MiddlewareUser's Guide for Oracle Data Visualization (PDF)
  15. What's New for Oracle Data Visualization Desktop
  16. Machine Learning (Oracle A-Team Chronicles) 
  17. Oracle Underground BI & Dataviz (Blogger)
  18. Data Science for Business (Safari)
  19. Learn Modern Data Visualization with Oracle Analytics
  20. Click here for more A-Team Oracle Analytics (OAC) Blogs.
  21. How Can I Use Oracle Machine Learning Models in Oracle Analytics?
  22. How Do I Choose a Predictive Model Algorithm?

2 comments:

lymacsau said...

Great post, thanks for sharing!

Hương Lâm với website Huonglam.vn chuyên cung cấp máy photocopy toshiba cũ và dòng máy máy photocopy ricoh cũ uy tín, giá rẻ nhất TP.HCM

George said...

With the world becoming increasingly fast-paced and data-driven, there is an increasing need for businesses to find better ways to understand their data. This is where aSAS analytics comes in. When it comes to sas analytics, the possibilities are endless. With SAS at your disposal, you can maximize your business potential by gaining a higher level of insight than ever before into your business data. If you do not know where to begin, the first step is to think about how you want to analyze your data.