Saturday, January 28, 2017

Apache Ambari一Knowing the Basics

Apache Ambari provides an end-to-end management and monitoring application for Apache Hadoop. In a nutshell, it can be used for:
  • Managing most of the administration activities in a Hadoop cluster
    • To install, provision, deploy, manage, and monitor a Hadoop cluster
    • To hide the complexity of the Hadoop cluster management 
    • To provide a very easy and intuitive web UI.
  •  Integrating with other external tools for better management via its RESTful APIs
In this article, we will use Apache Ambari (Version 2.2.2.0) from the Hortonworks Data Platform (HDP) in our discussion.


Hortonworks Data Platform


You can deploy Hortonworks Data Platform (HDP) using either Apache Ambari or not.  If you choose not to use Ambari, you can follow the instructions here.  However, it will be much easier to deploy Apache Hadoop stack with Ambari (see the instruction here).

After initial installation and deployment, your Apache Hadoop cluster could still grow and change with use over the time.  With Apache Ambari, you can easily and quickly add new services or expand the storage and processing capacity of the cluster.

The ecosystem of Ambari consists of three main components:
  • Ambari Web
  • Ambari Server
    •  Serves as the collection point for data from across the cluster
  • Ambari Agent
    • Run on each host in the cluster to allow the Ambari Server to control it


Ambari Web


Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitor services for all nodes in your cluster from a central point.

Ambari Web is a client-side JavaScript application, which calls the Ambari REST API (accessible from the Ambari Server) to access cluster information and perform cluster operations. A relational database is used to store the information about the cluster configuration and topology.



With Ambari Views, you can customize the Ambari Web UI.  Ambari Views offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web.

Ambari Server


Before starting the Ambari Server, you must set up the Ambari Server once. Setup configures Ambari to talk to the Ambari database, installs the JDK and allows you to customize the user account (default: root) the Ambari Server daemon will run as.

After setup, all the configuration is stored in: 
  • /etc/ambari-server/conf/ambari.properties
Then you can run the following commands from the Ambari Server host:
  • ambari-server start
    • If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.
  • ambari-server status
  • ambari-server stop
Once started, you can access Ambari using the following URL:
http://{ambari-server-hostname}:8080
from a web browser.

The start script /usr/sbin/ambari-server is a shell script, that set environment variables and kicks off a python script which kicks off a java process (see details here).

Trobleshooting

You can start ambari in debug mode to get more detailed output via:
ambari-server start --verbose --debug
# or for short
ambari-server start -v -g
Significant files/directories:
  • /var/log/ambari-server/ambari-server.log 
    • To monitor Ambari, you do
      • tail -f /var/log/ambari-server/ambari-server.log
  • /var/lib/ambari-server/resources/ 
    • SQL scripts to initialize psql DB



Ambari Agent


Ambari Agents will heartbeat to the master every few seconds and will receive commands from the master in the heartbeat responses. Heartbeat responses will be the only way for master to send a command to the Agent. The command will be queued in the action queue, which will be picked up by the action executioner.

Action executioner will pick the right tool (Puppet, Python, etc) for execution depending on the command type and action type. Thus the actions sent in the hearbeat response will be processed asynchronously at the Agent. The action executioner will put the response or progress messages on the message queue. The Agent will send everything on the message queue to the master in the next heartbeat.

Here are the steps you do to install Ambari Agent manually on RHEL/CentOS/Oracle Linux 6:
  1. Install the Ambari Agent on every host in your cluster.
    • yum install ambari-agent
  2. Using a text editor, configure the Ambari Agent by editing the ambari-agent.ini file as shown below:
    • vi /etc/ambari-agent/conf/ambari-agent.ini
      [server] hostname=
      url_port=8440
      secured_url_port=8441
  3. Start the Agent on every host in your cluster.
    • ambari-agent start
      • The Agent registers with the Server on start.
The Agent should not die if the master suddenly disappears. It should continue to poll at regular intervals and recover as needed when the master comes back up:
The Ambari Agent should keep all the necessary information it planned to send to the master in case of a connection failure and re-send the information after the master comes back up. It may need to re-register if it was previously in the process of registering.

Troubleshooting
  • The first thing to do if you run into trouble is to find the logs. Ambari Agent logs can be found at 
    • /var/log/ambari-agent/ambari-agent.log


References

  1. Non-Ambari Cluster Installation Guide (HDP)
  2. Installing Hadoop Using Ambari
  3. DEPLOYING, MANAGING AND CONFIGURING HDP WITH AMBARI 1.7 (tutorial)
  4. Understanding the Basics
  5. Ambari Architecture (pdf)
  6. How can I start my Ambari heartbeat?
  7. Installing Ambari Agents Manually
  8. INTRODUCING APACHE AMBARI FOR DEPLOYING AND MANAGING APACHE HADOOP (HortonWorks)
  9. Ambari Admin Guide (Version 2.2.2.0) 
  10. Ambari Reference Guide (Version 2.2.2.0) 
  11. Ambari User’s Guide (Version 2.2.2.0) 
  12. Ambari Troubleshooting Guide (Version 2.2.2.0) 
  13. Ambari Security Guide (Version 2.2.2.0) 
  14. Automated Install with Ambari  (Version 2.2.2.0) 
  15. Ambari Upgrade Guide (Version 2.2.2.0) 
  16. Install, Configure, and Deploy an HDP Cluster 
  17. Ambari Agent certificates (to be removed if you need to update the Agent)
    • /var/lib/ambari-agent/keys/*
  18. Blueprint Support for HA Clusters (Apache Ambari)
  19. Ambari Metrics System ("AMS")
    • A system for collecting, aggregating and serving Hadoop and system metrics in Ambari-managed clusters
  20. All Cloud-related articles on Xml and More
  21. Installing Spark Using Ambari (HDP-2.5.3)

1 comment:

Alka said...

3. Hi...Came across your article. Found it quite interesting & helpful for anyone who wants to learn Apache Ambari in details. The article shows your vast knowledge in this field which inspire many people to learn this. Few weeks back one of my close relative has taken training from MaxMunus & he is highly satisfied with their training quality.If you come across anyone willing to take training along with certification guidance ,you can ask him to reach them on this
Apache Ambari Training