As known from a previous article:
Three Benchmarks for SQL Coverage in HiBench Suite ― a Bigdata Micro Benchmark SuiteHiBench Suite is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilization.
When your big data platform (e.g.,e HDP) evolves, it comes times that you need to upgrade your benchmark suite accordingly.
In this article, we will cover how to pick up the latest HiBench Suite (i.e., version 6.1) to work with Spark 2.1.
HiBench Suite
To download the master branch of HiBench Suite (click the diagram to enlarge), you can visit its home page here . On 06/18/2017, its latest version is 6.1.
To download, we have selected "Download ZIP" and saved it to our Linux system.
Maven
From the home page, you can select "docs" link to view all available document links:
From the build-hibench.md link, it tells you how to build HiBench Suite using Maven. For example, if you want to build all workloads in HiBench, you use the below command:
mvn -Dspark=2.1 -Dscala=2.11 clean package
This could be time consuming because the hadoopbench (one of the workload) relies on 3rd party tools like Mahout and Nutch. The build process automatically downloads these tools for you. If you won't run these workloads, you can only build a specific framework (e.g., sparkbench) to speed up the build process.To get familiar with Maven, you can start with this pdf file. In it, you will learn how to download Maven and how to setup system to run it. Here we will just discuss some issues that we have run into while building all workloads using Maven.
Maven Installation Issues and Solutions
Proxy Server
Since our Linux system sits behind the firewall, we need to set up the following environment variables:
export http_proxy=http://your.proxy.com:80/
export https_proxy=http://your.proxy.com:80/
Environment Setup
As instructed in pdf file, we have setup below additional environment variables:
export JAVA_HOME=~/JVMs/8u40_fcs
export PATH=/scratch/username/maven/apache-maven-3.5.0/bin:$PATH
export PATH=$JAVA_HOME/bin:$PATH
Maven Configuration & Debugging
POM stands for Project Object Model. which
- Is the Fundamental Unit of Work in Maven
- Is an XML file
- Always resides in the base directory of the project as pom.xml.
The POM contains information about the project and various configuration detail used by Maven to build the project(s).
In the default ~/.m2/settings, we have set the following entries for POM:
First we have set the localRepository to a new location because an issue described here.[7,8] Secondly, we have set longer timeout for both connection and read.
If you have run into issues with a plugin, you can use "help:describe"
To learn how to run a specific benchmark named sparkbench, you can click on the document link below:
It has also introduced a new hierarchy (i.e. category like micro, websearch, sql, etc) to organize workload runtime scripts:
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<localRepository>/scratch/username/.m2/repository</localRepository>
<server>
<id>central</id>
<configuration>
<httpConfiguration>
<all>
<connectionTimeout>120000</connectionTimeout>
<readTimeout>120000</readTimeout>
</all>
</httpConfiguration>
</configuration>
</server>
If you have run into issues with a plugin, you can use "help:describe"
mvn help:describe -Dplugin=com.googlecode.maven-download-plugin:maven-download-pluginto display a list of its attributes and goals for debugging.
How to Run Sparkbench
To learn how to run a specific benchmark named sparkbench, you can click on the document link below:
run-sparkbench.md
Without much ado, we will focus on the configuration and tuning part of the task. For other details, please refer to the document.
New Configuration Files
In the new HiBench, there are two levels of configuration:
(Global level)
(Global level)
${hibench.home}/conf/hadoop.conf
${hibench.home}/hibench.conf
${hibench.home}/conf/spark.conf
(Workload level)
${hibench.home}/conf/workloads/micro/terasort.conf
It has also introduced a new hierarchy (i.e. category like micro, websearch, sql, etc) to organize workload runtime scripts:
${hibench.home}/<benchmark>/<framework>
where <benchmark> could be:
micro/terasort
websearch/pagerank
sql/aggregation
sql/join
sql/scan
where <framework> could be:
spark
hadoop
prepare
Similarly for the workload-specific configuration file, they are stored under the new category level:${hibench.home}/conf/workloads/${benchmark.conf}
where <benchmark.conf> could be:
micro/terasort.conf
websearch/pagerank.conf
sql/aggregation.conf
sql/join.conf
sql/scan.conf
References
- HORTONW0RKS DATA PLATFORM (HDP®)
- Readme (HiBench 6.1)
- HiBench Download
- How to build HiBench (HiBench 6.1)
- How to run sparkbench (HiBench 6.1)
- How-to documents (HiBench 6.1)
- Idiosyncrasies of ${HOME} that is an NFS Share (Xml and More)
- Apache Maven Build Tool (pdf)
- How do I set the location of my local Maven repository?
- Guide to Configuring Plug-ins (Apache Maven Project)
- Available Plugins (Apache Maven Project)
- MojoExecutionException
- Installing Maven Plugins (SourceForge.net)
- Download Plugin For Maven » 1.2.0
- Group: com.googlecode.maven-download-plugin
4 comments:
Are you looking to make cash from your websites/blogs by using popunder advertisments?
In case you do, have you tried using Clickadu?
lovely blog.
www.office.com/setup
In the present day, id verification service is greater in requirement. One will obtain various verification methods on a reliable platform titled Trust Swiftly, plus all approaches give a lot better safety to any web business. If you go to this TrustSwiftly internet site, you'll get an increasing number of information regarding id verification service.
Hello all,
I've launched a new adult website and hundred of categories.
Those who have interested to watch in good quality please go and watch some amazing videos. Below I'm mentioning website:
FreePorn
Post a Comment