From 02356a38788c076bf95a1ad5cd28ea99a794ec4e Mon Sep 17 00:00:00 2001
From: Ada Wang <wang4luning@gmail.com>
Date: Fri, 29 Apr 2022 18:39:25 +0800
Subject: [PATCH] [KYUUBI #2025][HIVE] Add a Hive on Yarn doc

### _Why are the changes needed?_

jackson-annotations 2.13 and hive-exec 2.3.9 have class conflict

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #2326 from deadwind4/hive-ci.

Closes #2025

0644c564 [Ada Wang] [KYUUBI #2025][HIVE] Add a hive on yarn doc

Authored-by: Ada Wang <wang4luning@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
---
 docs/deployment/engine_on_yarn.md | 56 ++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/docs/deployment/engine_on_yarn.md b/docs/deployment/engine_on_yarn.md
index adb336e05..9c94bb6f6 100644
--- a/docs/deployment/engine_on_yarn.md
+++ b/docs/deployment/engine_on_yarn.md
@@ -146,7 +146,7 @@ yarn.application.id: application_00000000XX_00XX
 
 Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
 
-If the `HADOOP_CONF_DIR` points the YARN and HDFS cluster correctly, and the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job:
+If the `HADOOP_CONF_DIR` points to the YARN and HDFS cluster correctly, and the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job:
 ```bash
 # we assume to be in the root directory of 
 # the unzipped Flink distribution
@@ -186,3 +186,57 @@ As Kyuubi Flink SQL engine wraps the Flink SQL client that currently does not su
 so `security.kerberos.login.keytab` and `security.kerberos.login.principal` should not use now.
 
 Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit).
+
+## Deploy Kyuubi Hive Engine on Yarn
+
+### Requirements
+
+When you want to deploy Kyuubi's Hive SQL engines on YARN, you'd better have cognition upon the following things.
+
+- Knowing the basics about [Running Hive on YARN](https://cwiki.apache.org/confluence/display/Hive/GettingStarted)
+- A binary distribution of Hive
+  - You can use the built-in Hive distribution
+  - Download a recent Hive distribution from the [Hive official website](https://hive.apache.org/downloads.html) and unpack it
+  - You can [Build Hive](https://cwiki.apache.org/confluence/display/Hive//GettingStarted#GettingStarted-BuildingHivefromSource)
+- An active [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster
+  - Make sure your YARN cluster is ready for accepting Hive applications by running yarn top. It should show no error messages
+- An active [Apache Hadoop HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) cluster
+- Setup Hadoop client configurations at the machine the Kyuubi server locates
+- An active [Hive Metastore Service](https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore)
+
+### Configurations
+
+#### Environment
+
+Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
+
+If the `HADOOP_CONF_DIR` points to the YARN and HDFS cluster correctly, you should be able to run the `Hive SQL` example on YARN.
+
+```bash
+$ $HIVE_HOME/bin/hiveserver2
+# In another terminal
+$ $HIVE_HOME/bin/beeline -u 'jdbc:hive2://localhost:10000/default'
+0: jdbc:hive2://localhost:10000/default> CREATE TABLE pokes (foo INT, bar STRING);
+0: jdbc:hive2://localhost:10000/default> INSERT INTO TABLE pokes VALUES (1, 'hello');
+```
+
+If the `Hive SQL` passes and there is a job in Yarn Web UI, It indicates the hive environment is normal.
+
+#### Required Environment Variable
+
+The `HIVE_HADOOP_CLASSPATH` is required, too. It should contain `commons-collections-*.jar`, 
+`hadoop-client-runtime-*.jar`, `hadoop-client-api-*.jar` and `htrace-core4-*.jar`.
+All four jars are in the `HADOOP_HOME`. 
+
+For example, in Hadoop 3.1.0 version, the following is their location. 
+- `${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar`
+- `${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar`
+- `${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar`
+- `${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar`
+
+Configure them in `$KYUUBI_HOME/conf/kyuubi-env.sh` or `$HIVE_HOME/conf/hive-env.sh`, e.g.
+
+```bash
+$ echo "export HADOOP_CONF_DIR=/path/to/hadoop/conf" >> $KYUUBI_HOME/conf/kyuubi-env.sh
+$ echo "export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar" >> $KYUUBI_HOME/conf/kyuubi-env.sh
+```