Saturday, April 22, 2017

Apache Hive—Hive CLI vs Beeline

Lineage of Apache Hive
  1. Original model 
    • was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce
  2. Client-server model
    1. Hive CLI + HiveServer1
    2. Beeline + HiveServer2 (HS2)
In this article, we will examine the differences between Hive CLI and Beeline, especially a new Hive CLI implementation (i.,e Beeline + embedded HS2).


Hive CLI vs Beeline


Hive CLI, which is an Apache Thrift-based client, Beeline is a JDBC client based on the SQLLine CLI — although the JDBC driver used communicates with HiveServer2 using HiveServer2’s Thrift APIs.

In the latest Apache Hive, both "Hive CLI" and Beeline are supported via
exec "${HIVE_HOME}/bin/hive.distro" "$@"
For example, to launch both command line interfaces, you do

Hive CLI
$ hive --service cli --help

Beeline

$ hive --service beeline --help

Using Hive (version: 1.2.1000.2.4.2.0-258) as an example, here are the list of services available:
beeline cleardanglingscratchdir cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version
Note that "beeline" command is equivalent to "hive --service beeline".

Hive CLI (New)


Because of the wide use of Hive CLI, the Hive community is replacing Hive CLI's implementation with a new Hive CLI on top of Beeline plus embedded HiveServer2 (HIVE-10511) so that the Hive community only needs to maintain a single code path.[2]

In this way, the new Hive CLI is just an alias to Beeline at two levels:
  • Shell script level 
  • High code level. 

Using the JMH to measure the average time cost when retrieving a data set,  The community has reported that there is no clear performance gap between New Hive CLI and Beeline in terms of retrieving data.

Interactive Shell Commands Support

When $HIVE_HOME/bin/hive is run without either the -e or -f option, it enters interactive shell mode.  To learn more, read the following references:

Beeline


With  HiveServer2 (HS2),  Beeline is the recommended command-line interface,  To learn more, read the following references:

References

  1. Migrating from Hive CLI to Beeline: A Primer
  2. Replacing the Implementation of Hive CLI Using Beeline
  3. Setting up HiveServer2 (Apache Hive)
  4. Hive CLI
  5. HiveServer2 Clients (Apache) 
  6. SQLLine Manual
  7. Beeline—Command Line Shell
  8. Embedded mode
    • Running Hive client tools with embedded servers is a convenient way to test a query or debug a problem. While both Hive CLI and Beeline can embed a Hive server instance, you would start them in embedded mode in slightly different ways. 
  9. Using the Hive command line and Beeline (Book: Apache Hive Essentials)
    • For Beeline, ; is not needed after the command that starts with !.
    • When running a query in Hive CLI, the MapReduce statistics information is shown in the console screen while processing, whereas Beeline does not.
    • Both Beeline and Hive CLI do not support running a pasted query with <tab> inside, because <tab> is used for autocomplete by default in the environment. Alternatively, running the query from files has no such issues.
    • Hive CLI shows the exact line and position of the Hive query or syntax errors when the query has multiple lines. However, Beeline processes the multiple-line query as a single line, so only the position is shown for query or syntax errors with the line number as 1 for all instances. For this aspect, Hive CLI is more convenient than Beeline for debugging the Hive query.
    • In both Hive CLI and Beeline, using the up and down arrow keys can retrieve up to 10,000 previous commands. The !history command can be used in Beeline to show all history.
    • Both Hive CLI and Beeline supports variable substitution.

6 comments:

Peyton George said...
This comment has been removed by the author.
Peyton George said...

It is amazing post, i am really impressed of your post, its really useful. Thank you for sharing This article. norton.com/setup | office.com/setup

dee-mac said...

Avast is the best way to keep your system and data safe and secure from viruses. It is very light so it does not reduce the speed of your system. Avast also provides a password vault. To get avast antivirus, go to avast.com/activate.
Avast premium security

home.mcafee.com said...

Mcafee.com/activate- The planet is filled with endless antivirus initiatives, but McAfee sticks out in the middle of them. This offers the latest and most secure services, such as data privacy, cloud security, application security, and more. If you have some issues related to harmful bugs, ransomware or other risks, then McAfee is the best solution to use. As a consequence, future customers can now visit www.mcafee.com/enable to conveniently access, install and trigger McAfee.
mcafee.com/activate

dee-mac said...

Thank a lot for this post that was very interesting. Keep posting like those amazing posts, this is really awesome.
Install quickbook app

dee-mac said...

Thanks for sharing great blog, helped me to gain a lot of knowledge regarding the industry!For more information visit my website :-Activate quickbook desktop