Sunday, June 18, 2017

HiBench Suite―How to Build and Run the Big Data Benchmarks

As known from a previous article:
Three Benchmarks for SQL Coverage in HiBench Suite ― a Bigdata Micro Benchmark Suite
HiBench Suite is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilization.

When your big data platform (e.g.,e HDP) evolves, it comes times that you need to upgrade your benchmark suite accordingly.

In this article, we will cover how to pick up the latest HiBench Suite (i.e., version 6.1) to work with Spark 2.1.

HiBench Suite

To download the master branch of HiBench Suite (click the diagram to enlarge), you can visit its home page here . On 06/18/2017, its latest version is 6.1.

To download, we have selected "Download ZIP" and saved it to our Linux system.


From the home page, you can select "docs" link to view all available document links:
From the link, it tells you how to build HiBench Suite using Maven. For example, if you want to build all workloads in HiBench, you use the below command:

mvn -Dspark=2.1 -Dscala=2.11 clean package
This could be time consuming because the hadoopbench (one of the workload) relies on 3rd party tools like Mahout and Nutch. The build process automatically downloads these tools for you. If you won't run these workloads, you can only build a specific framework (e.g., sparkbench) to speed up the build process.

To get familiar with Maven, you can start with this pdf file. In it, you will learn how to download Maven and how to setup system to run it. Here we will just discuss some issues that we have run into while building all workloads using Maven.

Maven Installation Issues and Solutions

Proxy Server

Since our Linux system sits behind the firewall, we need to set up the following environment variables:
export http_proxy=
export https_proxy=

Environment Setup

As instructed in pdf file, we have setup below additional environment variables:

export JAVA_HOME=~/JVMs/8u40_fcs
export PATH=/scratch/username/maven/apache-maven-3.5.0/bin:$PATH
export PATH=$JAVA_HOME/bin:$PATH

Maven Configuration & Debugging

POM stands for Project Object Model. which
  • Is the Fundamental Unit of Work in Maven
  • Is an XML file
  • Always resides in the base directory of the project as pom.xml.

The POM contains information about the project and various configuration detail used by Maven to build the project(s).

In the default ~/.m2/settings, we have set the following entries for POM:

<settings xmlns=""

First we have set the localRepository to a new location because an issue described here.[7,8] Secondly, we have set longer timeout for both connection and read.

If you have run into issues with a plugin, you can use "help:describe"
mvn  help:describe -Dplugin=com.googlecode.maven-download-plugin:maven-download-plugin
to display a list of its attributes and goals for debugging.

How to Run Sparkbench

To learn how to run a specific benchmark named sparkbench, you can click on the document link below:
Without much ado, we will focus on the configuration and tuning part of the task. For other details, please refer to the document.

New Configuration Files

In the new HiBench, there are two levels of configuration:

(Global level)

(Workload level)

It has also introduced a new hierarchy (i.e. category like micro, websearch, sql, etc) to organize workload runtime scripts:
  where <benchmark> could be:
  where <framework> could be:
Similarly for the workload-specific configuration file, they are stored under the new category level:

  where <benchmark.conf> could be:


  2. Readme (HiBench 6.1)
  3. HiBench Download
  4. How to build HiBench (HiBench 6.1)
  5. How to run sparkbench (HiBench 6.1)
  6. How-to documents (HiBench 6.1)
  7. Idiosyncrasies of ${HOME} that is an NFS Share (Xml and More)
  8. Apache Maven Build Tool (pdf)
  9. How do I set the location of my local Maven repository?
  10. Guide to Configuring Plug-ins (Apache Maven Project)
  11. Available Plugins (Apache Maven Project)
  12. MojoExecutionException
  13. Installing Maven Plugins (
  14. Download Plugin For Maven » 1.2.0
  15. Group: com.googlecode.maven-download-plugin


Blogger said...

Are you looking to make cash from your websites/blogs by using popunder advertisments?
In case you do, have you tried using Clickadu?

Blogger said...

Quantum Binary Signals

Get professional trading signals delivered to your mobile phone daily.

Start following our trades NOW & make up to 270% per day.

Haris Mushtaq said...

Install full MICROSOFT office setup 365 with our support. Now setting up your account will be a cakewalk with us.

Setup and Install Office 2017/2018 365 on your Mac/PC with genuine OFFICE PRODUCT key.
Word, Excel, PowerPoint, Outlook, OneNote and OneDrive, on your PC,Publisher and Access.Everything you need for home, education and work.
We are providing independent support service if in case you face problem to activate or SETUP OFFICE product.

Regards -

chandan kumar said...

Install Office Setup – Sign-in to you microsoft account and then Enter 25 digit alphanumeric office setup product key on country and on next to start office installation.We are the best Office Setup in US, Canada and Australia. At Office Setup, we put high effort, moderate IT answers for organization's, and people.Whether set up or beginning, equipment or programming, system or electronic, we have something for each financial plan. is the exchanging name of Spacesolutions Pvt Ltd situated in California, USA. We are not Microsoft. However we are trained professionals to provide technical support for We are the world's biggest innovation wholesaler and a main innovation deals, showcasing and logistics organization for the IT business around the world. Our people group based ethos, focused costs and nature of administration settles on Office Setup the right decision for your IT needs .Whether you are a learner or experienced in IT our courses are altered to furnish you with the key achievement criteria to thrive in future attempts.