kyuubi/docs/configurations.md
2018-03-07 16:09:45 +08:00

116 lines
8.3 KiB
Markdown

# Configuration Guide
[TOC]
Kyuubi provides several kinds of properites to configure the system:
**Kyuubi properties:** control most Kyuui server's own behaviors. Most of them determined on server starting. They can be treat like normal Spark properties by setting them in `spark-defaults.conf` file or via `--conf` parameter in server starting scripts.
**Spark properties:** become session level options, which are used to generate a SparkContext instances and passed to Kyuubi Server by JDBC/ODBC connection strings. Setting them in `$SPARKHOME/conf/spark-defaults.conf` supplies with default values for each session.
**Hive properties:**
1. Hive client options which are used for SparkSessin to talk to Hive MetaStore Server could be configured in a `hive-site.xml` and placed it in `$SPARKHOME/conf` directory, or treating them as Spark properties with `spark.hadoop.` prefix.
2. Kyuubi Frontend Service options is a small part of HiveServer2.
**Hadoop properties:** specifying `HADOOP_CONF_DIR` or `YARN_CONF_DIR` to the directory contains hadoop configuration files.
**Logging** can be configured through `$SPARKHOME/conf/log4j.properties`.
## Kyuubi Configurations
Kyuubi properties control most Kyuui server's own behaviors. Most of them determined on server starting. They can be treat like normal Spark properties by setting them in `spark-defaults.conf` file or via `--conf` parameter in server starting scripts.
For instance, start Kyuubi with HA enabled.
```bash
$ bin/start-kyuubi.sh \
--master yarn \
--deploy-mode client \
--driver-memory 10g \
--conf spark.kyuubi.ha.enabled=true \
--conf spark.kyuubi.ha.zk.quorum=zk1.server.url,zk2.server.url
```
#### High Availability
Name|Default|Description
---|---|---
spark.kyuubi.ha.enabled|false|Whether KyuubiServer supports dynamic service discovery for its clients. To support this, each instance of KyuubiServer currently uses ZooKeeper to register itself, when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: spark.kyuubi.ha.zk.quorum in their connection string.
spark.kyuubi.ha.zk.quorum|none|Comma separated list of ZooKeeper servers to talk to, when KyuubiServer supports service discovery via Zookeeper.
spark.kyuubi.ha.zk.namespace|kyuubiserver|The parent node in ZooKeeper used by KyuubiServer when supporting dynamic service discovery.
spark.kyuubi.ha.zk.client.port|2181|The port of ZooKeeper servers to talk to. If the list of Zookeeper servers specified in spark.kyuubi.zookeeper.quorum does not contain port numbers, this value is used.
spark.kyuubi.ha.zk.session.timeout|1,200,000|ZooKeeper client's session timeout (in milliseconds). The client is disconnected, and as a result, all locks released, if a heartbeat is not sent in the timeout.
spark.kyuubi.ha.zk.connection.basesleeptime|1,000|Initial amount of time (in milliseconds) to wait between retries when connecting to the ZooKeeper server when using ExponentialBackoffRetry policy.
spark.kyuubi.ha.zk.connection.max.retries|3|Max retry times for connecting to the zk server
#### Operation Log
Name|Default|Description
---|---|---
spark.kyuubi.logging.operation.enabled|true|When true, KyuubiServer will save operation logs and make them available for clients
spark.kyuubi.logging.operation.log.dir|`SPARK_LOG_DIR` -> `SPARK_HOME`/operation_logs -> `java.io.tmpdir`/operation_logs|Top level directory where operation logs are stored if logging functionality is enabled
#### Background Execution Thread Pool
Name|Default|Description
---|---|---
spark.kyuubi.async.exec.threads|100|Number of threads in the async thread pool for KyuubiServer.
spark.kyuubi.async.exec.wait.queue.size|100|Size of the wait queue for async thread pool in KyuubiServer. After hitting this limit, the async thread pool will reject new requests.
spark.kyuubi.async.exec.keep.alive.time|10,000|Time (in milliseconds) that an idle KyuubiServer async thread (from the thread pool) will wait for a new task to arrive before terminating.
spark.kyuubi.async.exec.shutdown.timeout|10,000|How long KyuubiServer shutdown will wait for async threads to terminate.
#### Session Idle Check
Name|Default|Description
---|---|---
spark.kyuubi.frontend.session.check.interval|6h|The check interval for frontend session/operation timeout, which can be disabled by setting to zero or negative value.
spark.kyuubi.frontend.session.timeout|8h|The check interval for session/operation timeout, which can be disabled by setting to zero or negative value.
spark.kyuubi.frontend.session.check.operation| true |Session will be considered to be idle only if there is no activity, and there is no pending operation. This setting takes effect only if session idle timeout `spark.kyuubi.frontend.session.timeout` and checking `spark.kyuubi.frontend.session.check.interval` are enabled.
spark.kyuubi.backend.session.check.interval|20min|The check interval for backend session a.k.a SparkSession timeout.
#### On Spark Session Init
Name|Default|Description
---|---|---
spark.kyuubi.backend.session.wait.other.times | 60 | How many times to check when another session with the same user is initializing SparkContext. Total Time will be times by `spark.kyuubi.backend.session.wait.other.interval`.
spark.kyuubi.backend.session.wait.other.interval|1s|The interval for checking whether other thread with the same user has completed SparkContext instantiation.
spark.kyuubi.backend.session.init.timeout|60s|How long we suggest the server to give up instantiating SparkContext.
---
## Spark Configurations
Spark properties become session level options, which are used to generate a SparkContext instances and passed to Kyuubi Server by JDBC/ODBC connection strings. Setting them in `$SPARKHOME/conf/spark-defaults.conf` supplies with default values for each session.
Name|Default|Description
---|---|---
spark.driver.memory| 1g | Amount of memory to use for the Kyuubi Server instance. Set this through the --driver-memory command line option or in your default properties file.
spark.driver.extraJavaOptions| (none) | A string of extra JVM options to pass to the Kyuubi Server instance. For instance, GC settings or other logging. Set this through the --driver-java-options command line option or in your default properties file.
Spark properties for [Driver](http://spark.apache.org/docs/latest/configuration.html#runtime-environment) like those above controls Kyuubi Server's own behaviors, while other properies could be set in JDBC/ODBC connection strings.
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html) in the online documentation for an overview on how to configure Spark.
## Hive Configurations
### Hive client options
These configurations are used for SparkSessin to talk to Hive MetaStore Server could be configured in a `hive-site.xml` and placed it in `$SPARKHOME/conf` directory, or treating them as Spark properties with `spark.hadoop.` prefix.
### Kyuubi Frontend Service options
Name|Default|Description
---|---|---
hive.server2.thrift.bind.host | (none) | Bind host on which to run the Kyuubi Frontend service.
hive.server2.thrift.port| 10000 | Port number of Kyuubi Frontend service.
hive.server2.thrift.min.worker.threads| 5 | Minimum number of Thrift worker threads.
hive.server2.thrift.max.worker.threads| 500 | Maximum number of Thrift worker threads
hive.server2.thrift.worker.keepalive.time | 60s| Keepalive time (in seconds) for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval.
hive.server2.authentication | NONE | Client authentication types. NONE: no authentication check KERBEROS: Kerberos/GSSAPI authentication.
hive.server2.allow.user.substitution | true | Allow alternate user to be specified as part of Kyuubi open connection request.
hive.server2.enable.doAs | true | Set true to have Kyuubi execute SQL operations as the user making the calls to it.
hive.server2.authentication.kerberos.keytab | (none) | Kerberos keytab file for server principal
hive.server2.authentication.kerberos.principal | (none) | Kerberos server principal
hive.server2.thrift.max.message.size | 104857600 | Maximum message size in bytes a Kyuubi server will accept.
## Hadoop Configurations
Please refer to the [Apache Hadoop](http://hadoop.apache.org)'s online documentation for an overview on how to configure Hadoop.