Saturday, April 22, 2017

Apache Hive—Hive CLI vs Beeline

Lineage of Apache Hive
  1. Original model 
    • was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce
  2. Client-server model
    1. Hive CLI + HiveServer1
    2. Beeline + HiveServer2 (HS2)
In this article, we will examine the differences between Hive CLI and Beeline, especially a new Hive CLI implementation (i.,e Beeline + embedded HS2).


Hive CLI vs Beeline


Hive CLI, which is an Apache Thrift-based client, Beeline is a JDBC client based on the SQLLine CLI — although the JDBC driver used communicates with HiveServer2 using HiveServer2’s Thrift APIs.

In the latest Apache Hive, both "Hive CLI" and Beeline are supported via
exec "${HIVE_HOME}/bin/hive.distro" "$@"
For example, to launch both command line interfaces, you do

Hive CLI
$ hive --service cli --help

Beeline

$ hive --service beeline --help

Using Hive (version: 1.2.1000.2.4.2.0-258) as an example, here are the list of services available:
beeline cleardanglingscratchdir cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version
Note that "beeline" command is equivalent to "hive --service beeline".

Hive CLI (New)


Because of the wide use of Hive CLI, the Hive community is replacing Hive CLI's implementation with a new Hive CLI on top of Beeline plus embedded HiveServer2 (HIVE-10511) so that the Hive community only needs to maintain a single code path.[2]

In this way, the new Hive CLI is just an alias to Beeline at two levels:
  • Shell script level 
  • High code level. 

Using the JMH to measure the average time cost when retrieving a data set,  The community has reported that there is no clear performance gap between New Hive CLI and Beeline in terms of retrieving data.

Interactive Shell Commands Support

When $HIVE_HOME/bin/hive is run without either the -e or -f option, it enters interactive shell mode.  To learn more, read the following references:

Beeline


With  HiveServer2 (HS2),  Beeline is the recommended command-line interface,  To learn more, read the following references:

References

  1. Migrating from Hive CLI to Beeline: A Primer
  2. Replacing the Implementation of Hive CLI Using Beeline
  3. Setting up HiveServer2 (Apache Hive)
  4. Hive CLI
  5. HiveServer2 Clients (Apache) 
  6. SQLLine Manual
  7. Beeline—Command Line Shell
  8. Embedded mode
    • Running Hive client tools with embedded servers is a convenient way to test a query or debug a problem. While both Hive CLI and Beeline can embed a Hive server instance, you would start them in embedded mode in slightly different ways. 
  9. Using the Hive command line and Beeline (Book: Apache Hive Essentials)
    • For Beeline, ; is not needed after the command that starts with !.
    • When running a query in Hive CLI, the MapReduce statistics information is shown in the console screen while processing, whereas Beeline does not.
    • Both Beeline and Hive CLI do not support running a pasted query with <tab> inside, because <tab> is used for autocomplete by default in the environment. Alternatively, running the query from files has no such issues.
    • Hive CLI shows the exact line and position of the Hive query or syntax errors when the query has multiple lines. However, Beeline processes the multiple-line query as a single line, so only the position is shown for query or syntax errors with the line number as 1 for all instances. For this aspect, Hive CLI is more convenient than Beeline for debugging the Hive query.
    • In both Hive CLI and Beeline, using the up and down arrow keys can retrieve up to 10,000 previous commands. The !history command can be used in Beeline to show all history.
    • Both Hive CLI and Beeline supports variable substitution.

1 comment:

Blogger said...

If you need your ex-girlfriend or ex-boyfriend to come crawling back to you on their knees (even if they're dating somebody else now) you have to watch this video
right away...

(VIDEO) Have your ex CRAWLING back to you...?