Saturday, June 24, 2017

How to Access OAuth Protected Resources Using Postman

To access an OAuth 2.0 protected resource, you need to provide an access token to access it.  For example, in the new implementation of Oracle Event Hub Cloud Service, Kafka brokers are OAuth 2.0 protected resources.

In this article, we will demonstrate how to obtain an access token of "bearer" type using Postman.

OAuth 2.0


OAuth enables clients to access protected resources by obtaining an access token, which is defined in "The OAuth 2.0 Authorization Framework" as "a string representing an access authorization issued to the client", rather than using the resource owner's credentials directly.

There are different access token types.  For example,

Each access token type specifies the additional attributes (if any) sent to the client together with the "access_token" response parameter. It also defines the HTTP authentication method used to include the access token when making a protected resource request.

For example, in this article, you will learn how to retrieve a bearer token using Postman, in which the generated HTTP response will look like below:

{
    "access_token": "eyJ4NXQjUzI1Ni <snipped> M8Ei_VoT0kjc",
    "token_type": "Bearer",
    "expires_in": 3600
}



To prevent misuse, bearer tokens need to be protected from disclosure in storage and in transport.


Postman


Postman is a Google Chrome app for interacting with HTTP APIs. It presents you with a friendly GUI for constructing requests and reading responses. To download it, click on this link.  Note that Postman has been moved from a Chrome App to a Native App after this article has been written.[7]

You can generate code snippets (see above diagram; however, a better alternative is to export/import a collection) using Postman for sharing purpose.  For example, we will use the following snippets for illustration in this article.

POST /oauth2/v1/token HTTP/1.1
Host: psmdemo2.identity.cxxxx1.oxxxxdev.com
Content-Type: application/x-www-form-urlencoded
Accept: application/json
Authorization: Basic MDlCRjg0RjYzQTlENEY4MjlCOTM2REFERDVGNzk3NTlfQVBQSUQ6NzY1NDQxMjUtNDE4ZC00YzlmLTg2MzUtNTFmMjRhMjFjYjMw
Cache-Control: no-cache
Postman-Token: 55cfed4b-509c-5a6f-a415-8542d04fc7ad

grant_type=password&username=xxxxx@oracle.com&password=welcome1&scope=https://09XX11X11X9D4F829B936DADD5F79759.uscom-central-1.cxxxx1.oxxxxdev.com:443/psmdemo2-mytopicresource


Generating Bearer Token


To access OAuth protected resources, you need to retrieve an access token first.  In this example, we will demonstrate with the access token of bearer type.

Based on shared code snippets above, it tells us to send a HTTP POST request to the following URL:

https://psmdemo2.identity.cxxxx1.oxxxxdev.com/oauth2/v1/token

which is composed from the following information in the snippets:

POST /oauth2/v1/token HTTP/1.1
Host: psmdemo2.identity.cxxxx1.oxxxxdev.com

Note that we have used https instead of http in the URL.

For the Authorization, we have specified "Basic Auth" type with an Username and a Password and, in the snippets, it shows as below:

Authorization: Basic MDlCRjg0RjYzQTlENEY4MjlCOTM2REFERDVGNzk3NTlfQVBQSUQ6NzY1NDQxMjUtNDE4ZC00YzlmLTg2MzUtNTFmMjRhMjFjYjMw

In the "Header" part, we have specified two headers in addition to the "Authorization" header using "Bulk Edit" mode:

Content-Type:application/x-www-form-urlencoded
Accept:application/json
Authorization:Basic MDlCRjg0RjYzQTlENEY4MjlCOTM2REFERDVGNzk3NTlfQVBQSUQ6NzY1NDQxMjUtNDE4ZC00YzlmLTg2MzUtNTFmMjRhMjFjYjMw


In the "Body" part, we have copied the last line from the code snippets to it in raw mode:

grant_type=password&username=xxxxx@oracle.com&password=welcome1&scope=https://09XX11X11X9D4F829B936DADD5F79759.uscom-central-1.cxxxx1.oxxxxdev.com:443/psmdemo2-mytopicresource

Note that the above body part is specifically to the Oracle Identity Cloud Service (IDCS) implementation.  Similarly, the "Authorization" part requires us to specify "Client ID" and "Client Secret" as username and password, which are also IDCS-specific.

How to Use Bearer Token


To access OAuth protected resources, you specify retrieved access token in the "Header" of subsequent HTTP requests with the following format:

Authorization:Bearer eyJ4NXQjUzI1Ni <snipped> M8Ei_VoT0kjc

Note that this access token will expire in one hour as noted in the HTTP response:

"expires_in": 3600

Summary


From this article, we have demonstrated that:
  • What a Bearer Token is
  • What an access token looks like
  • How to share a code snippet
    • We have shown to reverse-engineer from the shared code snippets to the final setup in Postman is not straightforward.  For example, the code snippet doesn't tell us:
      • What the "Username" and "Password" to be used.  For example, we need to know that it requires the "Client ID" and "Client Secret" of application to be used in this case.
    • Therefore, if you share the code snippets with co-workers, you also need to add further annotations to allow them to reproduce the HTTP requests to be sent. 

Sunday, June 18, 2017

HiBench Suite―How to Build and Run the Big Data Benchmarks

As known from a previous article:
Three Benchmarks for SQL Coverage in HiBench Suite ― a Bigdata Micro Benchmark Suite
HiBench Suite is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilization.

When your big data platform (e.g.,e HDP) evolves, it comes times that you need to upgrade your benchmark suite accordingly.

In this article, we will cover how to pick up the latest HiBench Suite (i.e., version 6.1) to work with Spark 2.1.



HiBench Suite


To download the master branch of HiBench Suite (click the diagram to enlarge), you can visit its home page here . On 06/18/2017, its latest version is 6.1.

To download, we have selected "Download ZIP" and saved it to our Linux system.


Maven


From the home page, you can select "docs" link to view all available document links:
From the build-hibench.md link, it tells you how to build HiBench Suite using Maven. For example, if you want to build all workloads in HiBench, you use the below command:

mvn -Dspark=2.1 -Dscala=2.11 clean package
This could be time consuming because the hadoopbench (one of the workload) relies on 3rd party tools like Mahout and Nutch. The build process automatically downloads these tools for you. If you won't run these workloads, you can only build a specific framework (e.g., sparkbench) to speed up the build process.

To get familiar with Maven, you can start with this pdf file. In it, you will learn how to download Maven and how to setup system to run it. Here we will just discuss some issues that we have run into while building all workloads using Maven.


Maven Installation Issues and Solutions


Proxy Server

Since our Linux system sits behind the firewall, we need to set up the following environment variables:
export http_proxy=http://your.proxy.com:80/
export https_proxy=http://your.proxy.com:80/

Environment Setup

As instructed in pdf file, we have setup below additional environment variables:

export JAVA_HOME=~/JVMs/8u40_fcs
export PATH=/scratch/username/maven/apache-maven-3.5.0/bin:$PATH
export PATH=$JAVA_HOME/bin:$PATH


Maven Configuration & Debugging

POM stands for Project Object Model. which
  • Is the Fundamental Unit of Work in Maven
  • Is an XML file
  • Always resides in the base directory of the project as pom.xml.

The POM contains information about the project and various configuration detail used by Maven to build the project(s).

In the default ~/.m2/settings, we have set the following entries for POM:

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <localRepository>/scratch/username/.m2/repository</localRepository>
  <server>
    <id>central</id>
    <configuration>
      <httpConfiguration>
        <all>
          <connectionTimeout>120000</connectionTimeout>
          <readTimeout>120000</readTimeout>
        </all>
      </httpConfiguration>
    </configuration>
  </server>

First we have set the localRepository to a new location because an issue described here.[7,8] Secondly, we have set longer timeout for both connection and read.

If you have run into issues with a plugin, you can use "help:describe"
mvn  help:describe -Dplugin=com.googlecode.maven-download-plugin:maven-download-plugin
to display a list of its attributes and goals for debugging.

How to Run Sparkbench


To learn how to run a specific benchmark named sparkbench, you can click on the document link below:
run-sparkbench.md
Without much ado, we will focus on the configuration and tuning part of the task. For other details, please refer to the document.

New Configuration Files

In the new HiBench, there are two levels of configuration:

(Global level)

${hibench.home}/conf/hadoop.conf 
${hibench.home}/hibench.conf 
${hibench.home}/conf/spark.conf
(Workload level)
${hibench.home}/conf/workloads/micro/terasort.conf  

It has also introduced a new hierarchy (i.e. category like micro, websearch, sql, etc) to organize workload runtime scripts:
${hibench.home}/<benchmark>/<framework>
  where <benchmark> could be:
    micro/terasort
    websearch/pagerank
    sql/aggregation
    sql/join
    sql/scan
  where <framework> could be:
    spark
    hadoop
    prepare
Similarly for the workload-specific configuration file, they are stored under the new category level:

${hibench.home}/conf/workloads/${benchmark.conf}
  where <benchmark.conf> could be:
    micro/terasort.conf
    websearch/pagerank.conf
    sql/aggregation.conf
    sql/join.conf
    sql/scan.conf


References

  1. HORTONW0RKS DATA PLATFORM (HDP®)
  2. Readme (HiBench 6.1)
  3. HiBench Download
  4. How to build HiBench (HiBench 6.1)
  5. How to run sparkbench (HiBench 6.1)
  6. How-to documents (HiBench 6.1)
  7. Idiosyncrasies of ${HOME} that is an NFS Share (Xml and More)
  8. Apache Maven Build Tool (pdf)
  9. How do I set the location of my local Maven repository?
  10. Guide to Configuring Plug-ins (Apache Maven Project)
  11. Available Plugins (Apache Maven Project)
  12. MojoExecutionException
  13. Installing Maven Plugins (SourceForge.net)
  14. Download Plugin For Maven » 1.2.0
  15. Group: com.googlecode.maven-download-plugin

Tuesday, June 13, 2017

Linux sar Command: Using -o and -f in Pairs

System Activity Reporter (SAR) is one of the important tool to monitor Linux servers. By using this command you can analyse the history of different resource usages.

In this article, we will examine how to monitor resource usages of servers (e.g., in a cluster) during the entire run of an application (e.g., a benchmark) using the following sar command pairs:
  • Data Collection
    • nohup sar -A -o /tmp/sar.data 10 > /dev/null &
  • Record Extraction
    • sar -f /tmp/sar.data [-u | -d | -n DEV]

Sar Command Options


In the data collection phase, we will use -o option to save data in a file of binary format and then use -f option combined with other options (e.g.,  [-u | -d | -n DEV]) to extract records related to different statistics (e.g., CPU, I/O, Network):

Main options

       -o [ filename ]
              Save the readings in the file in binary form. Each reading is in
              a separate record. The default value of the  filename  parameter
              is  the  current daily data file, the /var/log/sa/sadd file. The
              -o option is exclusive of the -f option.  All the data available
              from  the  kernel  are saved in the file (in fact, sar calls its
              data collector sadc with the option "-S ALL". See sadc(8) manual
              page).


       -f [ filename ]
              Extract records from filename (created by the -o filename flag).
              The default value of the filename parameter is the current daily
              data file, the /var/log/sa/sadd file. The -f option is exclusive
              of the -o option.

Others

       -u [ ALL ]
              Report CPU utilization. The ALL keyword indicates that  all  the
              CPU fields should be displayed.

       -d    Report activity for each block device  (kernels  2.4  and  newer
              only).

       -n { keyword [,...] | ALL }
              Report network statistics.


Monitoring the Entire Run of a Benchmark


In the illustration, we will use three benchmarks (i.e., scan / aggregation / join) in the HiBench suite as examples (see [2] for details).  At beginning of each benchmark run, we will start up sar commands on the servers of a cluster; then followed by running spark application of a specific workload; finally, we will kill the sar processes at the end of run.

run.sh
#!/bin/bash

if [ $# -ne 2 ]; then
  echo "usage: run.sh "
  echo "  where could be:"
  echo "    scan"
  echo "    aggregation"
  echo "    join"
  echo "  where could be:"
  echo "    mapreduce"
  echo "    spark/java"
  echo "    spark/scala"
  echo "    spark/python"
  exit 1
fi

workload=$1
target=$2
workloadsRoot=/data/hive/BDCSCE-HiBench/workloads

mkdir ~/$workload/$target

echo "start all sar commands ..."

./stats.sh start

while read -r vmIp
do
  echo "start stats on $vmIp"
  ./myssh opc@$vmIp "~/stats.sh start" &
done < vm.lst

# run a test in different workloads using different lang interfaces
$workloadsRoot/$workload/$target/bin/run.sh


echo "stop all sar commands ..."
./stats.sh stop

while read -r vmIp
do
  echo "stop stats on $vmIp"
  ./myssh opc@$vmIp "~/stats.sh stop" &
done < vm.lst


stats.sh

#!/bin/sh

case $1 in
  'start')
        pkill sar
        rm /tmp/sar.data
        nohup sar -A -o /tmp/sar.data 10 > /dev/null &
        ;;
  'stop')
        pkill sar
        scp /tmp/sar.data ~
        ;;
  '*')
        echo "usage: $0 start|stop"
        ''
esac

CPU Statistics


To view the overall CPU statistics, you can use option -u as follows:

$ sar -f sar.data -u

03:39:28 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle

03:39:38 PM     all      0.03      0.00      0.01      0.02      0.00     99.94

03:39:48 PM     all      0.05      0.00      0.05      0.02      0.01     99.88

<snipped> 

Average:        all      0.09      0.00      0.02      0.02      0.00     99.86

           

I/O Statistics of Block Devices


To view the activity for each block device, you can use option -d as follows:

$ sar -f sar.data -d


03:39:28 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
03:39:38 PM dev202-16      1.20      0.00     16.06     13.33      0.02     14.67      6.50      0.78
03:39:38 PM dev202-32      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM dev202-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM dev202-64      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM dev202-80      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM  dev251-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM  dev251-1      1.20      0.00     16.06     13.33      0.02     14.67      6.50      0.78
03:39:38 PM  dev251-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM  dev251-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:38 PM  dev251-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
<snipped>

Average:          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
Average:    dev202-16      1.22      0.00     15.79     12.99      0.01     11.85      6.57      0.80
Average:    dev202-32      0.85      0.00      8.92     10.46      0.01     10.27      4.18      0.36
Average:    dev202-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:    dev202-64      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:    dev202-80      0.21      0.00      1.74      8.43      0.00      0.30      0.08      0.00
Average:     dev251-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:     dev251-1      1.25      0.00     15.97     12.73      0.01     11.78      6.37      0.80
Average:     dev251-2      0.90      0.00      8.92      9.88      0.01     10.44      3.95      0.36
Average:     dev251-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:     dev251-4      0.22      0.00      1.74      8.00      0.00      0.28      0.08      0.00

If you are interested in the average tps of dev251-1:
              tps 
                     Indicate  the  number  of  transfers per second that were
                     issued to the device.  Multiple logical requests  can  be
                     combined  into  a  single  I/O  request  to the device. A
                     transfer is of indeterminate size.
you can specify the following command:
$ sar -f "$destDir/sar.data" -d | grep Average  | grep dev251-1 | awk '{print $3}'

Network Statistics


To view the overall statistics of network devices like eth0, bond, etc, you can use option -n as follows:

Syntax: 
sar -n [VALUE]
The VALUE can be:
  • DEV: For network devices like eth0, bond, etc. 
  • EDEV: For network device failure details 
  • NFS: For NFS client info 
  • NFSD: For NFS server info 
  • SOCK: For sockets in use for IPv4 
  • IP: For IPv4 network traffic 
  • EIP: For IPv4 network errors 
  • ICMP: For ICMPv4 network traffic 
  • EICMP: For ICMPv4 network errors 
  • TCP: For TCPv4 network traffic 
  • ETCP: For TCPv4 network errors 
  • UDP: For UDPv4 network traffic 
  • SOCK6, IP6, EIP6, ICMP6, UDP6 : For IPv6 
  • ALL: For all above mentioned information.
$ sar -f sar.data -n DEV

03:39:28 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s

03:39:38 PM      eth0     12.35     16.47      1.34      4.04      0.00      0.00      0.00
03:39:38 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
03:39:48 PM      eth0      9.63     14.64      1.17      4.03      0.00      0.00      0.00
03:39:48 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
<snipped> 

Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s 
  Average:         eth0     11.26     16.14      3.95      6.46      0.00      0.00      0.00 
  Average:           lo      1.23      1.23      0.33      0.33      0.00      0.00      0.00

If you are interested in the average rxkB/s or txkB/s of eth0:
              rxkB/s
                     Total number of kilobytes received per second.

              txkB/s
                     Total number of kilobytes transmitted per second.

you can specify the following command:
sar -f "$destDir/sar.data" -n DEV|grep Average|grep eth0 |awk '{print $5}'
sar -f "$destDir/sar.data" -n DEV|grep Average|grep eth0 |awk '{print $6}'

References

  1. sar command for Linux system performance monitoring
  2. Three Benchmarks for SQL Coverage in HiBench Suite ― a Bigdata Micro Benchmark Suite