[KYUUBI #2025][HIVE] Add a Hive on Yarn doc

### _Why are the changes needed?_

jackson-annotations 2.13 and hive-exec 2.3.9 have class conflict

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #2326 from deadwind4/hive-ci.

Closes #2025

0644c564 [Ada Wang] [KYUUBI #2025][HIVE] Add a hive on yarn doc

Authored-by: Ada Wang <wang4luning@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
This commit is contained in:
Ada Wang 2022-04-29 18:39:25 +08:00 committed by Kent Yao
parent 3ab2c81dce
commit 02356a3878
No known key found for this signature in database
GPG Key ID: F7051850A0AF904D

View File

@ -146,7 +146,7 @@ yarn.application.id: application_00000000XX_00XX
Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
If the `HADOOP_CONF_DIR` points the YARN and HDFS cluster correctly, and the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job:
If the `HADOOP_CONF_DIR` points to the YARN and HDFS cluster correctly, and the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job:
```bash
# we assume to be in the root directory of
# the unzipped Flink distribution
@ -186,3 +186,57 @@ As Kyuubi Flink SQL engine wraps the Flink SQL client that currently does not su
so `security.kerberos.login.keytab` and `security.kerberos.login.principal` should not use now.
Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit).
## Deploy Kyuubi Hive Engine on Yarn
### Requirements
When you want to deploy Kyuubi's Hive SQL engines on YARN, you'd better have cognition upon the following things.
- Knowing the basics about [Running Hive on YARN](https://cwiki.apache.org/confluence/display/Hive/GettingStarted)
- A binary distribution of Hive
- You can use the built-in Hive distribution
- Download a recent Hive distribution from the [Hive official website](https://hive.apache.org/downloads.html) and unpack it
- You can [Build Hive](https://cwiki.apache.org/confluence/display/Hive//GettingStarted#GettingStarted-BuildingHivefromSource)
- An active [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster
- Make sure your YARN cluster is ready for accepting Hive applications by running yarn top. It should show no error messages
- An active [Apache Hadoop HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) cluster
- Setup Hadoop client configurations at the machine the Kyuubi server locates
- An active [Hive Metastore Service](https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore)
### Configurations
#### Environment
Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`.
If the `HADOOP_CONF_DIR` points to the YARN and HDFS cluster correctly, you should be able to run the `Hive SQL` example on YARN.
```bash
$ $HIVE_HOME/bin/hiveserver2
# In another terminal
$ $HIVE_HOME/bin/beeline -u 'jdbc:hive2://localhost:10000/default'
0: jdbc:hive2://localhost:10000/default> CREATE TABLE pokes (foo INT, bar STRING);
0: jdbc:hive2://localhost:10000/default> INSERT INTO TABLE pokes VALUES (1, 'hello');
```
If the `Hive SQL` passes and there is a job in Yarn Web UI, It indicates the hive environment is normal.
#### Required Environment Variable
The `HIVE_HADOOP_CLASSPATH` is required, too. It should contain `commons-collections-*.jar`,
`hadoop-client-runtime-*.jar`, `hadoop-client-api-*.jar` and `htrace-core4-*.jar`.
All four jars are in the `HADOOP_HOME`.
For example, in Hadoop 3.1.0 version, the following is their location.
- `${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar`
- `${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar`
- `${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar`
- `${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar`
Configure them in `$KYUUBI_HOME/conf/kyuubi-env.sh` or `$HIVE_HOME/conf/hive-env.sh`, e.g.
```bash
$ echo "export HADOOP_CONF_DIR=/path/to/hadoop/conf" >> $KYUUBI_HOME/conf/kyuubi-env.sh
$ echo "export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar" >> $KYUUBI_HOME/conf/kyuubi-env.sh
```