Xml and More: June 2017

Saturday, June 24, 2017

How to Access OAuth Protected Resources Using Postman

To access an OAuth 2.0 protected resource, you need to provide an access token to access it. For example, in the new implementation of Oracle Event Hub Cloud Service, Kafka brokers are OAuth 2.0 protected resources.

In this article, we will demonstrate how to obtain an access token of "bearer" type using Postman.

OAuth 2.0

OAuth enables clients to access protected resources by obtaining an access token, which is defined in "The OAuth 2.0 Authorization Framework" as "a string representing an access authorization issued to the client", rather than using the resource owner's credentials directly.

There are different access token types. For example,

bearer
mac

Each access token type specifies the additional attributes (if any) sent to the client together with the "access_token" response parameter. It also defines the HTTP authentication method used to include the access token when making a protected resource request.

For example, in this article, you will learn how to retrieve a bearer token using Postman, in which the generated HTTP response will look like below:

{
    "access_token": "eyJ4NXQjUzI1Ni <snipped> M8Ei_VoT0kjc",
    "token_type": "Bearer",
    "expires_in": 3600
}

To prevent misuse, bearer tokens need to be protected from disclosure in storage and in transport.

Postman

Postman is a Google Chrome app for interacting with HTTP APIs. It presents you with a friendly GUI for constructing requests and reading responses. To download it, click on this link. Note that Postman has been moved from a Chrome App to a Native App after this article has been written.^[7]

You can generate code snippets (see above diagram; however, a better alternative is to export/import a collection) using Postman for sharing purpose. For example, we will use the following snippets for illustration in this article.

POST /oauth2/v1/token HTTP/1.1
Host: psmdemo2.identity.cxxxx1.oxxxxdev.com
Content-Type: application/x-www-form-urlencoded
Accept: application/json
Authorization: Basic MDlCRjg0RjYzQTlENEY4MjlCOTM2REFERDVGNzk3NTlfQVBQSUQ6NzY1NDQxMjUtNDE4ZC00YzlmLTg2MzUtNTFmMjRhMjFjYjMw
Cache-Control: no-cache
Postman-Token: 55cfed4b-509c-5a6f-a415-8542d04fc7ad

grant_type=password&username=xxxxx@oracle.com&password=welcome1&scope=https://09XX11X11X9D4F829B936DADD5F79759.uscom-central-1.cxxxx1.oxxxxdev.com:443/psmdemo2-mytopicresource

Generating Bearer Token

To access OAuth protected resources, you need to retrieve an access token first. In this example, we will demonstrate with the access token of bearer type.

Based on shared code snippets above, it tells us to send a HTTP POST request to the following URL:

https://psmdemo2.identity.cxxxx1.oxxxxdev.com/oauth2/v1/token

which is composed from the following information in the snippets:

POST /oauth2/v1/token HTTP/1.1
Host: psmdemo2.identity.cxxxx1.oxxxxdev.com

Note that we have used https instead of http in the URL.

For the Authorization, we have specified "Basic Auth" type with an Username and a Password and, in the snippets, it shows as below:

Authorization: Basic MDlCRjg0RjYzQTlENEY4MjlCOTM2REFERDVGNzk3NTlfQVBQSUQ6NzY1NDQxMjUtNDE4ZC00YzlmLTg2MzUtNTFmMjRhMjFjYjMw

In the "Header" part, we have specified two headers in addition to the "Authorization" header using "Bulk Edit" mode:

Content-Type:application/x-www-form-urlencoded
Accept:application/json
Authorization:Basic MDlCRjg0RjYzQTlENEY4MjlCOTM2REFERDVGNzk3NTlfQVBQSUQ6NzY1NDQxMjUtNDE4ZC00YzlmLTg2MzUtNTFmMjRhMjFjYjMw

In the "Body" part, we have copied the last line from the code snippets to it in raw mode:

grant_type=password&username=xxxxx@oracle.com&password=welcome1&scope=https://09XX11X11X9D4F829B936DADD5F79759.uscom-central-1.cxxxx1.oxxxxdev.com:443/psmdemo2-mytopicresource

Note that the above body part is specifically to the Oracle Identity Cloud Service (IDCS) implementation. Similarly, the "Authorization" part requires us to specify "Client ID" and "Client Secret" as username and password, which are also IDCS-specific.

How to Use Bearer Token

To access OAuth protected resources, you specify retrieved access token in the "Header" of subsequent HTTP requests with the following format:

Authorization:Bearer eyJ4NXQjUzI1Ni <snipped> M8Ei_VoT0kjc

Note that this access token will expire in one hour as noted in the HTTP response:

"expires_in": 3600

Summary

From this article, we have demonstrated that:

What a Bearer Token is
What an access token looks like
How to share a code snippet

We have shown to reverse-engineer from the shared code snippets to the final setup in Postman is not straightforward. For example, the code snippet doesn't tell us:

What the "Username" and "Password" to be used. For example, we need to know that it requires the "Client ID" and "Client Secret" of application to be used in this case.

Therefore, if you share the code snippets with co-workers, you also need to add further annotations to allow them to reproduce the HTTP requests to be sent.

References

Sunday, June 18, 2017

HiBench Suite―How to Build and Run the Big Data Benchmarks

As known from a previous article:

Three Benchmarks for SQL Coverage in HiBench Suite ― a Bigdata Micro Benchmark Suite

HiBench Suite is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilization.

When your big data platform (e.g.,e HDP) evolves, it comes times that you need to upgrade your benchmark suite accordingly.

In this article, we will cover how to pick up the latest HiBench Suite (i.e., version 6.1) to work with Spark 2.1.

HiBench Suite

To download the master branch of HiBench Suite (click the diagram to enlarge), you can visit its home page here . On 06/18/2017, its latest version is 6.1.

To download, we have selected "Download ZIP" and saved it to our Linux system.

Maven

From the home page, you can select "docs" link to view all available document links:

From the build-hibench.md link, it tells you how to build HiBench Suite using Maven. For example, if you want to build all workloads in HiBench, you use the below command:

mvn -Dspark=2.1 -Dscala=2.11 clean package

This could be time consuming because the hadoopbench (one of the workload) relies on 3rd party tools like Mahout and Nutch. The build process automatically downloads these tools for you. If you won't run these workloads, you can only build a specific framework (e.g., sparkbench) to speed up the build process.

To get familiar with Maven, you can start with this pdf file. In it, you will learn how to download Maven and how to setup system to run it. Here we will just discuss some issues that we have run into while building all workloads using Maven.

Maven Installation Issues and Solutions

Proxy Server

Since our Linux system sits behind the firewall, we need to set up the following environment variables:

export http_proxy=http://your.proxy.com:80/
export https_proxy=http://your.proxy.com:80/

Environment Setup

As instructed in pdf file, we have setup below additional environment variables:

export JAVA_HOME=~/JVMs/8u40_fcs

export PATH=/scratch/username/maven/apache-maven-3.5.0/bin:$PATH

export PATH=$JAVA_HOME/bin:$PATH

Maven Configuration & Debugging

POM stands for Project Object Model. which

Is the Fundamental Unit of Work in Maven
Is an XML file
Always resides in the base directory of the project as pom.xml.

The POM contains information about the project and various configuration detail used by Maven to build the project(s).

In the default ~/.m2/settings, we have set the following entries for POM:

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">

<localRepository>/scratch/username/.m2/repository</localRepository>

<id>central</id>

<all>

</all>

</httpConfiguration>

</configuration>

</server>

First we have set the localRepository to a new location because an issue described here.^[7,8] Secondly, we have set longer timeout for both connection and read.

If you have run into issues with a plugin, you can use "help:describe"

mvn  help:describe -Dplugin=com.googlecode.maven-download-plugin:maven-download-plugin

to display a list of its attributes and goals for debugging.

How to Run Sparkbench

To learn how to run a specific benchmark named sparkbench, you can click on the document link below:

run-sparkbench.md

Without much ado, we will focus on the configuration and tuning part of the task. For other details, please refer to the document.

New Configuration Files

In the new HiBench, there are two levels of configuration:

(Global level)

${hibench.home}/conf/hadoop.conf 
${hibench.home}/hibench.conf 
${hibench.home}/conf/spark.conf

(Workload level)

${hibench.home}/conf/workloads/micro/terasort.conf

It has also introduced a new hierarchy (i.e. category like micro, websearch, sql, etc) to organize workload runtime scripts:

${hibench.home}/<benchmark>/<framework>
  where <benchmark> could be:
    micro/terasort
    websearch/pagerank
    sql/aggregation
    sql/join
    sql/scan
  where <framework> could be:
    spark
    hadoop
    prepare

Similarly for the workload-specific configuration file, they are stored under the new category level:

${hibench.home}/conf/workloads/${benchmark.conf}
  where <benchmark.conf> could be:
    micro/terasort.conf
    websearch/pagerank.conf
    sql/aggregation.conf
    sql/join.conf
    sql/scan.conf

References

HORTONW0RKS DATA PLATFORM (HDP®)
Readme (HiBench 6.1)
HiBench Download
How to build HiBench (HiBench 6.1)
How to run sparkbench (HiBench 6.1)
How-to documents (HiBench 6.1)
Idiosyncrasies of ${HOME} that is an NFS Share (Xml and More)
Apache Maven Build Tool (pdf)
How do I set the location of my local Maven repository?
Guide to Configuring Plug-ins (Apache Maven Project)
Available Plugins (Apache Maven Project)
MojoExecutionException
Installing Maven Plugins (SourceForge.net)
Download Plugin For Maven » 1.2.0
Group: com.googlecode.maven-download-plugin

Tuesday, June 13, 2017

Linux sar Command: Using -o and -f in Pairs

System Activity Reporter (SAR) is one of the important tool to monitor Linux servers. By using this command you can analyse the history of different resource usages.

In this article, we will examine how to monitor resource usages of servers (e.g., in a cluster) during the entire run of an application (e.g., a benchmark) using the following sar command pairs:

Data Collection

nohup sar -A -o /tmp/sar.data 10 > /dev/null &

Record Extraction

sar -f /tmp/sar.data [-u | -d | -n DEV]

Sar Command Options

In the data collection phase, we will use -o option to save data in a file of binary format and then use -f option combined with other options (e.g., [-u | -d | -n DEV]) to extract records related to different statistics (e.g., CPU, I/O, Network):

Main options

-o [ filename ]
Save the readings in the file in binary form. Each reading is in
a separate record. The default value of the filename parameter
is the current daily data file, the /var/log/sa/sadd file. The
-o option is exclusive of the -f option. All the data available
from the kernel are saved in the file (in fact, sar calls its
data collector sadc with the option "-S ALL". See sadc(8) manual
page).

-f [ filename ]
Extract records from filename (created by the -o filename flag).
The default value of the filename parameter is the current daily
data file, the /var/log/sa/sadd file. The -f option is exclusive
of the -o option.

Others

-u [ ALL ]
Report CPU utilization. The ALL keyword indicates that all the
CPU fields should be displayed.

-d Report activity for each block device (kernels 2.4 and newer
only).

-n { keyword [,...] | ALL }
Report network statistics.

Monitoring the Entire Run of a Benchmark

In the illustration, we will use three benchmarks (i.e., scan / aggregation / join) in the HiBench suite as examples (see [2] for details). At beginning of each benchmark run, we will start up sar commands on the servers of a cluster; then followed by running spark application of a specific workload; finally, we will kill the sar processes at the end of run.

run.sh
#!/bin/bash

if [ $# -ne 2 ]; then
echo "usage: run.sh "
echo " where could be:"
echo " scan"
echo " aggregation"
echo " join"
echo " where could be:"
echo " mapreduce"
echo " spark/java"
echo " spark/scala"
echo " spark/python"
exit 1
fi

workload=$1
target=$2
workloadsRoot=/data/hive/BDCSCE-HiBench/workloads

mkdir ~/$workload/$target

echo "start all sar commands ..."

./stats.sh start

while read -r vmIp
do
echo "start stats on $vmIp"
./myssh opc@$vmIp "~/stats.sh start" &
done < vm.lst

# run a test in different workloads using different lang interfaces
$workloadsRoot/$workload/$target/bin/run.sh

echo "stop all sar commands ..."
./stats.sh stop

while read -r vmIp
do
echo "stop stats on $vmIp"
./myssh opc@$vmIp "~/stats.sh stop" &
done < vm.lst

stats.sh

#!/bin/sh

case $1 in
'start')
pkill sar
rm /tmp/sar.data
nohup sar -A -o /tmp/sar.data 10 > /dev/null &
;;
'stop')
pkill sar
scp /tmp/sar.data ~
;;
'*')
echo "usage: $0 start|stop"
''
esac

CPU Statistics

To view the overall CPU statistics, you can use option -u as follows:

$ sar -f sar.data -u

03:39:28 PM CPU %user %nice %system %iowait %steal %idle
03:39:38 PM all 0.03 0.00 0.01 0.02 0.00 99.94
03:39:48 PM all 0.05 0.00 0.05 0.02 0.01 99.88
<snipped>

Average: all 0.09 0.00 0.02 0.02 0.00 99.86

I/O Statistics of Block Devices

To view the activity for each block device, you can use option -d as follows:

$ sar -f sar.data -d

03:39:28 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
03:39:38 PM dev202-16 1.20 0.00 16.06 13.33 0.02 14.67 6.50 0.78
03:39:38 PM dev202-32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev202-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev202-64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev202-80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev251-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev251-1 1.20 0.00 16.06 13.33 0.02 14.67 6.50 0.78
03:39:38 PM dev251-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev251-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:38 PM dev251-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
<snipped>

Average: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
Average: dev202-16 1.22 0.00 15.79 12.99 0.01 11.85 6.57 0.80
Average: dev202-32 0.85 0.00 8.92 10.46 0.01 10.27 4.18 0.36
Average: dev202-48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev202-64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev202-80 0.21 0.00 1.74 8.43 0.00 0.30 0.08 0.00
Average: dev251-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev251-1 1.25 0.00 15.97 12.73 0.01 11.78 6.37 0.80
Average: dev251-2 0.90 0.00 8.92 9.88 0.01 10.44 3.95 0.36
Average: dev251-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev251-4 0.22 0.00 1.74 8.00 0.00 0.28 0.08 0.00

If you are interested in the average tps of dev251-1:
tps
Indicate the number of transfers per second that were
issued to the device. Multiple logical requests can be
combined into a single I/O request to the device. A
transfer is of indeterminate size.
you can specify the following command:

$ sar -f "$destDir/sar.data" -d | grep Average | grep dev251-1 | awk '{print $3}'

Network Statistics

To view the overall statistics of network devices like eth0, bond, etc, you can use option -n as follows:

Syntax:

sar -n [VALUE]
The VALUE can be:

DEV: For network devices like eth0, bond, etc.

EDEV: For network device failure details

NFS: For NFS client info

NFSD: For NFS server info

SOCK: For sockets in use for IPv4

IP: For IPv4 network traffic

EIP: For IPv4 network errors

ICMP: For ICMPv4 network traffic

EICMP: For ICMPv4 network errors

TCP: For TCPv4 network traffic

ETCP: For TCPv4 network errors

UDP: For UDPv4 network traffic

SOCK6, IP6, EIP6, ICMP6, UDP6 : For IPv6

ALL: For all above mentioned information.

$ sar -f sar.data -n DEV

03:39:28 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
03:39:38 PM eth0 12.35 16.47 1.34 4.04 0.00 0.00 0.00
03:39:38 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
03:39:48 PM eth0 9.63 14.64 1.17 4.03 0.00 0.00 0.00
03:39:48 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
<snipped>

Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
Average: eth0 11.26 16.14 3.95 6.46 0.00 0.00 0.00
Average: lo 1.23 1.23 0.33 0.33 0.00 0.00 0.00

If you are interested in the average rxkB/s or txkB/s of eth0:
rxkB/s
Total number of kilobytes received per second.

txkB/s
Total number of kilobytes transmitted per second.

you can specify the following command:

sar -f "$destDir/sar.data" -n DEV|grep Average|grep eth0 |awk '{print $5}'

sar -f "$destDir/sar.data" -n DEV|grep Average|grep eth0 |awk '{print $6}'