Saturday, January 28, 2017

Apache Ambari一Knowing the Basics

Apache Ambari provides an end-to-end management and monitoring application for Apache Hadoop. In a nutshell, it can be used for:
  • Managing most of the administration activities in a Hadoop cluster
    • To install, provision, deploy, manage, and monitor a Hadoop cluster
    • To hide the complexity of the Hadoop cluster management 
    • To provide a very easy and intuitive web UI.
  •  Integrating with other external tools for better management via its RESTful APIs
In this article, we will use Apache Ambari (Version from the Hortonworks Data Platform (HDP) in our discussion.

Hortonworks Data Platform

You can deploy Hortonworks Data Platform (HDP) using either Apache Ambari or not.  If you choose not to use Ambari, you can follow the instructions here.  However, it will be much easier to deploy Apache Hadoop stack with Ambari (see the instruction here).

After initial installation and deployment, your Apache Hadoop cluster could still grow and change with use over the time.  With Apache Ambari, you can easily and quickly add new services or expand the storage and processing capacity of the cluster.

The ecosystem of Ambari consists of three main components:
  • Ambari Web
  • Ambari Server
    •  Serves as the collection point for data from across the cluster
  • Ambari Agent
    • Run on each host in the cluster to allow the Ambari Server to control it

Ambari Web

Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitor services for all nodes in your cluster from a central point.

Ambari Web is a client-side JavaScript application, which calls the Ambari REST API (accessible from the Ambari Server) to access cluster information and perform cluster operations. A relational database is used to store the information about the cluster configuration and topology.

With Ambari Views, you can customize the Ambari Web UI.  Ambari Views offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web.

Ambari Server

Before starting the Ambari Server, you must set up the Ambari Server once. Setup configures Ambari to talk to the Ambari database, installs the JDK and allows you to customize the user account (default: root) the Ambari Server daemon will run as.

After setup, all the configuration is stored in: 
  • /etc/ambari-server/conf/
Then you can run the following commands from the Ambari Server host:
  • ambari-server start
    • If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.
  • ambari-server status
  • ambari-server stop
Once started, you can access Ambari using the following URL:
from a web browser.

The start script /usr/sbin/ambari-server is a shell script, that set environment variables and kicks off a python script which kicks off a java process (see details here).


You can start ambari in debug mode to get more detailed output via:
ambari-server start --verbose --debug
# or for short
ambari-server start -v -g
Significant files/directories:
  • /var/log/ambari-server/ambari-server.log 
    • To monitor Ambari, you do
      • tail -f /var/log/ambari-server/ambari-server.log
  • /var/lib/ambari-server/resources/ 
    • SQL scripts to initialize psql DB

Ambari Agent

Ambari Agents will heartbeat to the master every few seconds and will receive commands from the master in the heartbeat responses. Heartbeat responses will be the only way for master to send a command to the Agent. The command will be queued in the action queue, which will be picked up by the action executioner.

Action executioner will pick the right tool (Puppet, Python, etc) for execution depending on the command type and action type. Thus the actions sent in the hearbeat response will be processed asynchronously at the Agent. The action executioner will put the response or progress messages on the message queue. The Agent will send everything on the message queue to the master in the next heartbeat.

Here are the steps you do to install Ambari Agent manually on RHEL/CentOS/Oracle Linux 6:
  1. Install the Ambari Agent on every host in your cluster.
    • yum install ambari-agent
  2. Using a text editor, configure the Ambari Agent by editing the ambari-agent.ini file as shown below:
    • vi /etc/ambari-agent/conf/ambari-agent.ini
      [server] hostname=
  3. Start the Agent on every host in your cluster.
    • ambari-agent start
      • The Agent registers with the Server on start.
The Agent should not die if the master suddenly disappears. It should continue to poll at regular intervals and recover as needed when the master comes back up:
The Ambari Agent should keep all the necessary information it planned to send to the master in case of a connection failure and re-send the information after the master comes back up. It may need to re-register if it was previously in the process of registering.

  • The first thing to do if you run into trouble is to find the logs. Ambari Agent logs can be found at 
    • /var/log/ambari-agent/ambari-agent.log


  1. Non-Ambari Cluster Installation Guide (HDP)
  2. Installing Hadoop Using Ambari
  4. Understanding the Basics
  5. Ambari Architecture (pdf)
  6. How can I start my Ambari heartbeat?
  7. Installing Ambari Agents Manually
  9. Ambari Admin Guide (Version 
  10. Ambari Reference Guide (Version 
  11. Ambari User’s Guide (Version 
  12. Ambari Troubleshooting Guide (Version 
  13. Ambari Security Guide (Version 
  14. Automated Install with Ambari  (Version 
  15. Ambari Upgrade Guide (Version 
  16. Install, Configure, and Deploy an HDP Cluster 
  17. Ambari Agent certificates (to be removed if you need to update the Agent)
    • /var/lib/ambari-agent/keys/*
  18. Blueprint Support for HA Clusters (Apache Ambari)
  19. Ambari Metrics System ("AMS")
    • A system for collecting, aggregating and serving Hadoop and system metrics in Ambari-managed clusters
  20. All Cloud-related articles on Xml and More
  21. Installing Spark Using Ambari (HDP-2.5.3)

1 comment:

Blogger said...

BlueHost is definitely the best website hosting provider for any hosting services you need.