From 02356a38788c076bf95a1ad5cd28ea99a794ec4e Mon Sep 17 00:00:00 2001 From: Ada Wang Date: Fri, 29 Apr 2022 18:39:25 +0800 Subject: [PATCH] [KYUUBI #2025][HIVE] Add a Hive on Yarn doc ### _Why are the changes needed?_ jackson-annotations 2.13 and hive-exec 2.3.9 have class conflict ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #2326 from deadwind4/hive-ci. Closes #2025 0644c564 [Ada Wang] [KYUUBI #2025][HIVE] Add a hive on yarn doc Authored-by: Ada Wang Signed-off-by: Kent Yao --- docs/deployment/engine_on_yarn.md | 56 ++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/docs/deployment/engine_on_yarn.md b/docs/deployment/engine_on_yarn.md index adb336e05..9c94bb6f6 100644 --- a/docs/deployment/engine_on_yarn.md +++ b/docs/deployment/engine_on_yarn.md @@ -146,7 +146,7 @@ yarn.application.id: application_00000000XX_00XX Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`. -If the `HADOOP_CONF_DIR` points the YARN and HDFS cluster correctly, and the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job: +If the `HADOOP_CONF_DIR` points to the YARN and HDFS cluster correctly, and the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session, and submit an example job: ```bash # we assume to be in the root directory of # the unzipped Flink distribution @@ -186,3 +186,57 @@ As Kyuubi Flink SQL engine wraps the Flink SQL client that currently does not su so `security.kerberos.login.keytab` and `security.kerberos.login.principal` should not use now. Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit). + +## Deploy Kyuubi Hive Engine on Yarn + +### Requirements + +When you want to deploy Kyuubi's Hive SQL engines on YARN, you'd better have cognition upon the following things. + +- Knowing the basics about [Running Hive on YARN](https://cwiki.apache.org/confluence/display/Hive/GettingStarted) +- A binary distribution of Hive + - You can use the built-in Hive distribution + - Download a recent Hive distribution from the [Hive official website](https://hive.apache.org/downloads.html) and unpack it + - You can [Build Hive](https://cwiki.apache.org/confluence/display/Hive//GettingStarted#GettingStarted-BuildingHivefromSource) +- An active [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster + - Make sure your YARN cluster is ready for accepting Hive applications by running yarn top. It should show no error messages +- An active [Apache Hadoop HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) cluster +- Setup Hadoop client configurations at the machine the Kyuubi server locates +- An active [Hive Metastore Service](https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore) + +### Configurations + +#### Environment + +Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually, `$HADOOP_HOME/etc/hadoop`. + +If the `HADOOP_CONF_DIR` points to the YARN and HDFS cluster correctly, you should be able to run the `Hive SQL` example on YARN. + +```bash +$ $HIVE_HOME/bin/hiveserver2 +# In another terminal +$ $HIVE_HOME/bin/beeline -u 'jdbc:hive2://localhost:10000/default' +0: jdbc:hive2://localhost:10000/default> CREATE TABLE pokes (foo INT, bar STRING); +0: jdbc:hive2://localhost:10000/default> INSERT INTO TABLE pokes VALUES (1, 'hello'); +``` + +If the `Hive SQL` passes and there is a job in Yarn Web UI, It indicates the hive environment is normal. + +#### Required Environment Variable + +The `HIVE_HADOOP_CLASSPATH` is required, too. It should contain `commons-collections-*.jar`, +`hadoop-client-runtime-*.jar`, `hadoop-client-api-*.jar` and `htrace-core4-*.jar`. +All four jars are in the `HADOOP_HOME`. + +For example, in Hadoop 3.1.0 version, the following is their location. +- `${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar` +- `${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar` +- `${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar` +- `${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar` + +Configure them in `$KYUUBI_HOME/conf/kyuubi-env.sh` or `$HIVE_HOME/conf/hive-env.sh`, e.g. + +```bash +$ echo "export HADOOP_CONF_DIR=/path/to/hadoop/conf" >> $KYUUBI_HOME/conf/kyuubi-env.sh +$ echo "export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar" >> $KYUUBI_HOME/conf/kyuubi-env.sh +```