<!--
Thanks for sending a pull request!
Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->
### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
1. If you add a feature, you can talk about the use case of it.
2. If you fix a bug, you can clarify why it is a bug.
-->
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes #951 from pan3793/license.
Closes #951
4629eecd [Cheng Pan] Fix
c45a0784 [Cheng Pan] nit
b9a46b42 [Cheng Pan] pin license header at first line
80d1a71b [Cheng Pan] nit
b2a46e4c [Cheng Pan] Update
f6acaaf8 [Cheng Pan] minor
ef99183f [Cheng Pan] Add license header on all docs
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit a76c344042)
Signed-off-by: Cheng Pan <chengpan@apache.org>
121 lines
5.7 KiB
Markdown
121 lines
5.7 KiB
Markdown
<!--
|
|
- Licensed to the Apache Software Foundation (ASF) under one or more
|
|
- contributor license agreements. See the NOTICE file distributed with
|
|
- this work for additional information regarding copyright ownership.
|
|
- The ASF licenses this file to You under the Apache License, Version 2.0
|
|
- (the "License"); you may not use this file except in compliance with
|
|
- the License. You may obtain a copy of the License at
|
|
-
|
|
- http://www.apache.org/licenses/LICENSE-2.0
|
|
-
|
|
- Unless required by applicable law or agreed to in writing, software
|
|
- distributed under the License is distributed on an "AS IS" BASIS,
|
|
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
- See the License for the specific language governing permissions and
|
|
- limitations under the License.
|
|
-->
|
|
|
|
<div align=center>
|
|
|
|

|
|
|
|
</div>
|
|
|
|
# Deploy Kyuubi engines on Yarn
|
|
|
|
## Requirements
|
|
|
|
When you want to deploy Kyuubi's Spark SQL engines on YARN, you'd better have cognition upon the following things.
|
|
|
|
- Knowing the basics about [Running Spark on YARN](http://spark.apache.org/docs/latest/running-on-yarn.html)
|
|
- A binary distribution of Spark which is built with YARN support
|
|
- You can use the built-in Spark distribution
|
|
- You can get it from [Spark official website](https://spark.apache.org/downloads.html) directly
|
|
- You can [Build Spark](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn) with `-Pyarn` maven option
|
|
- An active [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) cluster
|
|
- An active Apache Hadoop HDFS cluster
|
|
- Setup Hadoop client configurations at the machine the Kyuubi server locates
|
|
|
|
|
|
## Configurations
|
|
|
|
### Environment
|
|
|
|
Either `HADOOP_CONF_DIR` or `YARN_CONF_DIR` is configured and points to the Hadoop client configurations directory, usually,`$HADOOP_HOME/etc/hadoop`
|
|
|
|
If the `HADOOP_CONF_DIR` points the YARN and HDFS cluster correctly, you should be able to run the `SparkPi` example on YARN.
|
|
```bash
|
|
$ HADOOP_CONF_DIR=/path/to/hadoop/conf $SPARK_HOME/bin/spark-submit \
|
|
--class org.apache.spark.examples.SparkPi \
|
|
--master yarn \
|
|
--queue thequeue \
|
|
$SPARK_HOME/examples/jars/spark-examples*.jar \
|
|
10
|
|
```
|
|
|
|
If the `SparkPi` passes, configure it in `$KYUUBI_HOME/conf/kyuubi-env.sh` or `$SPARK_HOME/conf/spark-env.sh`, e.g.
|
|
|
|
```bash
|
|
$ echo "export HADOOP_CONF_DIR=/path/to/hadoop/conf" >> $KYUUBI_HOME/conf/kyuubi-env.sh
|
|
```
|
|
|
|
### Spark Properties
|
|
|
|
These properties are defined by Spark and Kyuubi will pass them to `spark-submit` to create Spark applications.
|
|
|
|
**Note:** None of these would take effect if the application for a particular user already exists.
|
|
|
|
- Specify it in the JDBC connection URL, e.g. `jdbc:hive2://localhost:10009/;#spark.master=yarn;spark.yarn.queue=thequeue`
|
|
- Specify it in `$KYUUBI_HOME/conf/kyuubi-defaults.conf`
|
|
- Specify it in `$SPARK_HOME/conf/spark-defaults.conf`
|
|
|
|
**Note:** The priority goes down from top to bottom.
|
|
|
|
#### Master
|
|
|
|
Setting `spark.master=yarn` tells Kyuubi to submit Spark SQL engine applications to the YARN cluster manager.
|
|
|
|
#### Queue
|
|
|
|
Set `spark.yarn.queue=thequeue` in the JDBC connection string to tell Kyuubi to use the QUEUE in the YARN cluster, otherwise,
|
|
the QUEUE configured at Kyuubi server side will be used as default.
|
|
|
|
#### Sizing
|
|
|
|
Pass the configurations below through the JDBC connection string to set how many instances of Spark executor will be used
|
|
and how many cpus and memory will Spark driver, ApplicationMaster and each executor take.
|
|
|
|
Name | Default | Meaning
|
|
--- | --- | ---
|
|
spark.executor.instances | 1 | The number of executors for static allocation
|
|
spark.executor.cores | 1 | The number of cores to use on each executor
|
|
spark.yarn.am.memory | 512m | Amount of memory to use for the YARN Application Master in client mode
|
|
spark.yarn.am.memoryOverhead | amMemory * 0.10, with minimum of 384 | Amount of non-heap memory to be allocated per am process in client mode
|
|
spark.driver.memory | 1g | Amount of memory to use for the driver process
|
|
spark.driver.memoryOverhead | driverMemory * 0.10, with minimum of 384 | Amount of non-heap memory to be allocated per driver process in cluster mode
|
|
spark.executor.memory | 1g | Amount of memory to use for the executor process
|
|
spark.executor.memoryOverhead | executorMemory * 0.10, with minimum of 384 | Amount of additional memory to be allocated per executor process. This is memory that accounts for things like VM overheads, interned strings other native overheads, etc
|
|
|
|
It is recommended to use [Dynamic Allocation](http://spark.apache.org/docs/3.0.1/configuration.html#dynamic-allocation) with Kyuubi,
|
|
since the SQL engine will be long-running for a period, execute user's queries from clients aperiodically,
|
|
and the demand for computing resources is not the same for those queries.
|
|
It is better for Spark to release some executors when either the query is lightweight, or the SQL engine is being idled.
|
|
|
|
|
|
#### Tuning
|
|
|
|
You can specify `spark.yarn.archive` or `spark.yarn.jars` to point to a world-readable location that contains Spark jars on HDFS,
|
|
which allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs.
|
|
|
|
#### Others
|
|
|
|
Please refer to [Spark properties](http://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties) to check other acceptable configs.
|
|
|
|
|
|
## Kerberos
|
|
|
|
Kyuubi currently does not support Spark's [YARN-specific Kerberos Configuration](http://spark.apache.org/docs/3.0.1/running-on-yarn.html#kerberos),
|
|
so `spark.kerberos.keytab` and `spark.kerberos.principal` should not use now.
|
|
|
|
Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit)
|
|
|