# Configuration Guide [TOC] Kyuubi provides several kinds of properites to configure the system: **Kyuubi properties:** control most Kyuui server's own behaviors. Most of them determined on server starting. They can be treat like normal Spark properties by setting them in `spark-defaults.conf` file or via `--conf` parameter in server starting scripts. **Spark properties:** become session level options, which are used to generate a SparkContext instances and passed to Kyuubi Server by JDBC/ODBC connection strings. Setting them in `$SPARKHOME/conf/spark-defaults.conf` supplies with default values for each session. **Hive properties:** 1. Hive client options which are used for SparkSessin to talk to Hive MetaStore Server could be configured in a `hive-site.xml` and placed it in `$SPARKHOME/conf` directory, or treating them as Spark properties with `spark.hadoop.` prefix. 2. Kyuubi Frontend Service options is a small part of HiveServer2. **Hadoop properties:** specifying `HADOOP_CONF_DIR` or `YARN_CONF_DIR` to the directory contains hadoop configuration files. **Logging** can be configured through `$SPARKHOME/conf/log4j.properties`. ## Kyuubi Configurations Kyuubi properties control most Kyuui server's own behaviors. Most of them determined on server starting. They can be treat like normal Spark properties by setting them in `spark-defaults.conf` file or via `--conf` parameter in server starting scripts. For instance, start Kyuubi with HA enabled. ```bash $ bin/start-kyuubi.sh \ --master yarn \ --deploy-mode client \ --driver-memory 10g \ --conf spark.kyuubi.ha.enabled=true \ --conf spark.kyuubi.ha.zk.quorum=zk1.server.url,zk2.server.url ``` #### High Availability Name|Default|Description ---|---|--- spark.kyuubi.ha.enabled|false|Whether KyuubiServer supports dynamic service discovery for its clients. To support this, each instance of KyuubiServer currently uses ZooKeeper to register itself, when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: spark.kyuubi.ha.zk.quorum in their connection string. spark.kyuubi.ha.zk.quorum|none|Comma separated list of ZooKeeper servers to talk to, when KyuubiServer supports service discovery via Zookeeper. spark.kyuubi.ha.zk.namespace|kyuubiserver|The parent node in ZooKeeper used by KyuubiServer when supporting dynamic service discovery. spark.kyuubi.ha.zk.client.port|2181|The port of ZooKeeper servers to talk to. If the list of Zookeeper servers specified in spark.kyuubi.zookeeper.quorum does not contain port numbers, this value is used. spark.kyuubi.ha.zk.session.timeout|1,200,000|ZooKeeper client's session timeout (in milliseconds). The client is disconnected, and as a result, all locks released, if a heartbeat is not sent in the timeout. spark.kyuubi.ha.zk.connection.basesleeptime|1,000|Initial amount of time (in milliseconds) to wait between retries when connecting to the ZooKeeper server when using ExponentialBackoffRetry policy. spark.kyuubi.ha.zk.connection.max.retries|3|Max retry times for connecting to the zk server #### Operation Log Name|Default|Description ---|---|--- spark.kyuubi.logging.operation.enabled|true|When true, KyuubiServer will save operation logs and make them available for clients spark.kyuubi.logging.operation.log.dir|`SPARK_LOG_DIR` -> `SPARK_HOME`/operation_logs -> `java.io.tmpdir`/operation_logs|Top level directory where operation logs are stored if logging functionality is enabled #### Background Execution Thread Pool Name|Default|Description ---|---|--- spark.kyuubi.async.exec.threads|100|Number of threads in the async thread pool for KyuubiServer. spark.kyuubi.async.exec.wait.queue.size|100|Size of the wait queue for async thread pool in KyuubiServer. After hitting this limit, the async thread pool will reject new requests. spark.kyuubi.async.exec.keep.alive.time|10,000|Time (in milliseconds) that an idle KyuubiServer async thread (from the thread pool) will wait for a new task to arrive before terminating. spark.kyuubi.async.exec.shutdown.timeout|10,000|How long KyuubiServer shutdown will wait for async threads to terminate. #### Session Idle Check Name|Default|Description ---|---|--- spark.kyuubi.frontend.session.check.interval|6h|The check interval for frontend session/operation timeout, which can be disabled by setting to zero or negative value. spark.kyuubi.frontend.session.timeout|8h|The check interval for session/operation timeout, which can be disabled by setting to zero or negative value. spark.kyuubi.frontend.session.check.operation| true |Session will be considered to be idle only if there is no activity, and there is no pending operation. This setting takes effect only if session idle timeout `spark.kyuubi.frontend.session.timeout` and checking `spark.kyuubi.frontend.session.check.interval` are enabled. spark.kyuubi.backend.session.check.interval|20min|The check interval for backend session a.k.a SparkSession timeout. #### On Spark Session Init Name|Default|Description ---|---|--- spark.kyuubi.backend.session.wait.other.times | 60 | How many times to check when another session with the same user is initializing SparkContext. Total Time will be times by `spark.kyuubi.backend.session.wait.other.interval`. spark.kyuubi.backend.session.wait.other.interval|1s|The interval for checking whether other thread with the same user has completed SparkContext instantiation. spark.kyuubi.backend.session.init.timeout|60s|How long we suggest the server to give up instantiating SparkContext. --- ## Spark Configurations Spark properties become session level options, which are used to generate a SparkContext instances and passed to Kyuubi Server by JDBC/ODBC connection strings. Setting them in `$SPARKHOME/conf/spark-defaults.conf` supplies with default values for each session. Name|Default|Description ---|---|--- spark.driver.memory| 1g | Amount of memory to use for the Kyuubi Server instance. Set this through the --driver-memory command line option or in your default properties file. spark.driver.extraJavaOptions| (none) | A string of extra JVM options to pass to the Kyuubi Server instance. For instance, GC settings or other logging. Set this through the --driver-java-options command line option or in your default properties file. Spark properties for [Driver](http://spark.apache.org/docs/latest/configuration.html#runtime-environment) like those above controls Kyuubi Server's own behaviors, while other properies could be set in JDBC/ODBC connection strings. Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html) in the online documentation for an overview on how to configure Spark. ## Hive Configurations ### Hive client options These configurations are used for SparkSessin to talk to Hive MetaStore Server could be configured in a `hive-site.xml` and placed it in `$SPARKHOME/conf` directory, or treating them as Spark properties with `spark.hadoop.` prefix. ### Kyuubi Frontend Service options Name|Default|Description ---|---|--- hive.server2.thrift.bind.host | (none) | Bind host on which to run the Kyuubi Frontend service. hive.server2.thrift.port| 10000 | Port number of Kyuubi Frontend service. hive.server2.thrift.min.worker.threads| 5 | Minimum number of Thrift worker threads. hive.server2.thrift.max.worker.threads| 500 | Maximum number of Thrift worker threads hive.server2.thrift.worker.keepalive.time | 60s| Keepalive time (in seconds) for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval. hive.server2.authentication | NONE | Client authentication types. NONE: no authentication check KERBEROS: Kerberos/GSSAPI authentication. hive.server2.allow.user.substitution | true | Allow alternate user to be specified as part of Kyuubi open connection request. hive.server2.enable.doAs | true | Set true to have Kyuubi execute SQL operations as the user making the calls to it. hive.server2.authentication.kerberos.keytab | (none) | Kerberos keytab file for server principal hive.server2.authentication.kerberos.principal | (none) | Kerberos server principal hive.server2.thrift.max.message.size | 104857600 | Maximum message size in bytes a Kyuubi server will accept. ## Hadoop Configurations Please refer to the [Apache Hadoop](http://hadoop.apache.org)'s online documentation for an overview on how to configure Hadoop.