[KYUUBI #2333 ][KYUUBI #2554 ] Configuring Flink Engine heap memory and java opts

### _Why are the changes needed?_

fix #2554

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #2579 from jiaoqingbo/kyuubi2554.

Closes #2333

Closes #2554

f0365c91 [jiaoqingbo] code review
1700aab9 [jiaoqingbo] code review
1ca10a65 [jiaoqingbo] fix ut failed
b53dcdd4 [jiaoqingbo] code review
f9ceb72c [jiaoqingbo] [KYUUBI #2554] Configuring Flink Engine heap memory and java opts

Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>

2022-05-10 12:31:01 +08:00

89 KiB

Raw Blame History

Introduction to the Kyuubi Configurations System

Kyuubi provides several ways to configure the system and corresponding engines.

Environments

You can configure the environment variables in $KYUUBI_HOME/conf/kyuubi-env.sh, e.g, JAVA_HOME, then this java runtime will be used both for Kyuubi server instance and the applications it launches. You can also change the variable in the subprocess's env configuration file, e.g.$SPARK_HOME/conf/spark-env.sh to use more specific ENV for SQL engine applications.

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# - JAVA_HOME               Java runtime to use. By default use "java" from PATH.
#
#
# - KYUUBI_CONF_DIR         Directory containing the Kyuubi configurations to use.
#                           (Default: $KYUUBI_HOME/conf)
# - KYUUBI_LOG_DIR          Directory for Kyuubi server-side logs.
#                           (Default: $KYUUBI_HOME/logs)
# - KYUUBI_PID_DIR          Directory stores the Kyuubi instance pid file.
#                           (Default: $KYUUBI_HOME/pid)
# - KYUUBI_MAX_LOG_FILES    Maximum number of Kyuubi server logs can rotate to.
#                           (Default: 5)
# - KYUUBI_JAVA_OPTS        JVM options for the Kyuubi server itself in the form "-Dx=y".
#                           (Default: none).
# - KYUUBI_CTL_JAVA_OPTS    JVM options for the Kyuubi ctl itself in the form "-Dx=y".
#                           (Default: none).
# - KYUUBI_BEELINE_OPTS     JVM options for the Kyuubi BeeLine in the form "-Dx=Y".
#                           (Default: none)
# - KYUUBI_NICENESS         The scheduling priority for Kyuubi server.
#                           (Default: 0)
# - KYUUBI_WORK_DIR_ROOT    Root directory for launching sql engine applications.
#                           (Default: $KYUUBI_HOME/work)
# - HADOOP_CONF_DIR         Directory containing the Hadoop / YARN configuration to use.
# - YARN_CONF_DIR           Directory containing the YARN configuration to use.
#
# - SPARK_HOME              Spark distribution which you would like to use in Kyuubi.
# - SPARK_CONF_DIR          Optional directory where the Spark configuration lives.
#                           (Default: $SPARK_HOME/conf)
# - FLINK_HOME              Flink distribution which you would like to use in Kyuubi.
# - FLINK_CONF_DIR          Optional directory where the Flink configuration lives.
#                           (Default: $FLINK_HOME/conf)
# - FLINK_HADOOP_CLASSPATH  Required Hadoop jars when you use the Kyuubi Flink engine.
# - HIVE_HOME               Hive distribution which you would like to use in Kyuubi.
# - HIVE_CONF_DIR           Optional directory where the Hive configuration lives.
#                           (Default: $HIVE_HOME/conf)
# - HIVE_HADOOP_CLASSPATH   Required Hadoop jars when you use the Kyuubi Hive engine.
#


## Examples ##

# export JAVA_HOME=/usr/jdk64/jdk1.8.0_152
# export SPARK_HOME=/opt/spark
# export FLINK_HOME=/opt/flink
# export HIVE_HOME=/opt/hive
# export FLINK_HADOOP_CLASSPATH=/path/to/hadoop-client-runtime-3.3.2.jar:/path/to/hadoop-client-api-3.3.2.jar
# export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar
# export HADOOP_CONF_DIR=/usr/ndp/current/mapreduce_client/conf
# export YARN_CONF_DIR=/usr/ndp/current/yarn/conf
# export KYUUBI_JAVA_OPTS="-Xmx10g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"
# export KYUUBI_BEELINE_OPTS="-Xmx2g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark"

For the environment variables that only needed to be transferred into engine side, you can set it with a Kyuubi configuration item formatted kyuubi.engineEnv.VAR_NAME. For example, with kyuubi.engineEnv.SPARK_DRIVER_MEMORY=4g, the environment variable SPARK_DRIVER_MEMORY with value 4g would be transferred into engine side. With kyuubi.engineEnv.SPARK_CONF_DIR=/apache/confs/spark/conf, the value of SPARK_CONF_DIR in engine side is set to /apache/confs/spark/conf.

Kyuubi Configurations

You can configure the Kyuubi properties in $KYUUBI_HOME/conf/kyuubi-defaults.conf. For example:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

## Kyuubi Configurations

#
# kyuubi.authentication           NONE
# kyuubi.frontend.bind.host       localhost
# kyuubi.frontend.bind.port       10009
#

# Details in https://kyuubi.apache.org/docs/latest/deployment/settings.html

Authentication

Key	Default	Meaning	Type	Since
`kyuubi.authentication`	NONE	A comma separated list of client authentication types. NOSASL: raw transport. NONE: no authentication check. KERBEROS: Kerberos/GSSAPI authentication. CUSTOM: User-defined authentication. LDAP: Lightweight Directory Access Protocol authentication. Note that: For KERBEROS, it is SASL/GSSAPI mechanism, and for NONE, CUSTOM and LDAP, they are all SASL/PLAIN mechanism. If only NOSASL is specified, the authentication will be NOSASL. For SASL authentication, KERBEROS and PLAIN auth type are supported at the same time, and only the first specified PLAIN auth type is valid.	seq	1.0.0
`kyuubi.authentication.custom.class`	<undefined>	User-defined authentication implementation of org.apache.kyuubi.service.authentication.PasswdAuthenticationProvider	string	1.3.0
`kyuubi.authentication.ldap.base.dn`	<undefined>	LDAP base DN.	string	1.0.0
`kyuubi.authentication.ldap.domain`	<undefined>	LDAP domain.	string	1.0.0
`kyuubi.authentication.ldap.guidKey`	uid	LDAP attribute name whose values are unique in this LDAP server.For example:uid or cn.	string	1.2.0
`kyuubi.authentication.ldap.url`	<undefined>	SPACE character separated LDAP connection URL(s).	string	1.0.0
`kyuubi.authentication.sasl.qop`	auth	Sasl QOP enable higher levels of protection for Kyuubi communication with clients. auth - authentication only (default) auth-int - authentication plus integrity protection auth-conf - authentication plus integrity and confidentiality protection. This is applicable only if Kyuubi is configured to use Kerberos authentication.	string	1.0.0

Backend

Key	Default	Meaning	Type	Since
`kyuubi.backend.engine.exec.pool.keepalive.time`	PT1M	Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in SQL engine applications	duration	1.0.0
`kyuubi.backend.engine.exec.pool.shutdown.timeout`	PT10S	Timeout(ms) for the operation execution thread pool to terminate in SQL engine applications	duration	1.0.0
`kyuubi.backend.engine.exec.pool.size`	100	Number of threads in the operation execution thread pool of SQL engine applications	int	1.0.0
`kyuubi.backend.engine.exec.pool.wait.queue.size`	100	Size of the wait queue for the operation execution thread pool in SQL engine applications	int	1.0.0
`kyuubi.backend.server.event.json.log.path`	file:///tmp/kyuubi/events	The location of server events go for the builtin JSON logger	string	1.4.0
`kyuubi.backend.server.event.loggers`		A comma separated list of server history loggers, where session/operation etc events go. JSON: the events will be written to the location of kyuubi.backend.server.event.json.log.path JDBC: to be done CUSTOM: to be done.	seq	1.4.0
`kyuubi.backend.server.exec.pool.keepalive.time`	PT1M	Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in Kyuubi server	duration	1.0.0
`kyuubi.backend.server.exec.pool.shutdown.timeout`	PT10S	Timeout(ms) for the operation execution thread pool to terminate in Kyuubi server	duration	1.0.0
`kyuubi.backend.server.exec.pool.size`	100	Number of threads in the operation execution thread pool of Kyuubi server	int	1.0.0
`kyuubi.backend.server.exec.pool.wait.queue.size`	100	Size of the wait queue for the operation execution thread pool of Kyuubi server	int	1.0.0

Batch

Key	Default	Meaning	Type	Since
`kyuubi.batch.application.check.interval`	PT5S	The interval to check batch job application information.	duration	1.6.0
`kyuubi.batch.conf.ignore.list`		A comma separated list of ignored keys for batch conf. If the batch conf contains any of them, the key and the corresponding value will be removed silently during batch job submission. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering. You can also pre-define some config for batch job submission with prefix: kyuubi.batchConf.[batchType]. For example, you can pre-define `spark.master` for spark batch job with key `kyuubi.batchConf.spark.spark.master`.	seq	1.6.0

Credentials

Key	Default	Meaning	Type	Since
`kyuubi.credentials.check.interval`	PT5M	The interval to check the expiration of cached <user, CredentialsRef> pairs.	duration	1.6.0
`kyuubi.credentials.hadoopfs.enabled`	true	Whether to renew Hadoop filesystem delegation tokens	boolean	1.4.0
`kyuubi.credentials.hadoopfs.uris`		Extra Hadoop filesystem URIs for which to request delegation tokens. The filesystem that hosts fs.defaultFS does not need to be listed here.	seq	1.4.0
`kyuubi.credentials.hive.enabled`	true	Whether to renew Hive metastore delegation token	boolean	1.4.0
`kyuubi.credentials.idle.timeout`	PT6H	inactive users' credentials will be expired after a configured timeout	duration	1.6.0
`kyuubi.credentials.renewal.interval`	PT1H	How often Kyuubi renews one user's delegation tokens	duration	1.4.0
`kyuubi.credentials.renewal.retry.wait`	PT1M	How long to wait before retrying to fetch new credentials after a failure.	duration	1.4.0
`kyuubi.credentials.update.wait.timeout`	PT1M	How long to wait until credentials are ready.	duration	1.5.0

Delegation

Key	Default	Meaning	Type	Since
`kyuubi.delegation.key.update.interval`	PT24H	unused yet	duration	1.0.0
`kyuubi.delegation.token.gc.interval`	PT1H	unused yet	duration	1.0.0
`kyuubi.delegation.token.max.lifetime`	PT168H	unused yet	duration	1.0.0
`kyuubi.delegation.token.renew.interval`	PT168H	unused yet	duration	1.0.0

Engine

Key	Default	Meaning	Type	Since
`kyuubi.engine.connection.url.use.hostname`	true	(deprecated) When true, engine register with hostname to zookeeper. When spark run on k8s with cluster mode, set to false to ensure that server can connect to engine	boolean	1.3.0
`kyuubi.engine.deregister.exception.classes`		A comma separated list of exception classes. If there is any exception thrown, whose class matches the specified classes, the engine would deregister itself.	seq	1.2.0
`kyuubi.engine.deregister.exception.messages`		A comma separated list of exception messages. If there is any exception thrown, whose message or stacktrace matches the specified message list, the engine would deregister itself.	seq	1.2.0
`kyuubi.engine.deregister.exception.ttl`	PT30M	Time to live(TTL) for exceptions pattern specified in kyuubi.engine.deregister.exception.classes and kyuubi.engine.deregister.exception.messages to deregister engines. Once the total error count hits the kyuubi.engine.deregister.job.max.failures within the TTL, an engine will deregister itself and wait for self-terminated. Otherwise, we suppose that the engine has recovered from temporary failures.	duration	1.2.0
`kyuubi.engine.deregister.job.max.failures`	4	Number of failures of job before deregistering the engine.	int	1.2.0
`kyuubi.engine.event.json.log.path`	file:///tmp/kyuubi/events	The location of all the engine events go for the builtin JSON logger. Local Path: start with 'file://' HDFS Path: start with 'hdfs://'	string	1.3.0
`kyuubi.engine.event.loggers`	SPARK	A comma separated list of engine history loggers, where engine/session/operation etc events go. We use spark logger by default. SPARK: the events will be written to the spark listener bus. JSON: the events will be written to the location of kyuubi.engine.event.json.log.path JDBC: to be done CUSTOM: to be done.	seq	1.3.0
`kyuubi.engine.flink.extra.classpath`	<undefined>	The extra classpath for the flink sql engine, for configuring location of hadoop client jars, etc	string	1.6.0
`kyuubi.engine.flink.java.options`	<undefined>	The extra java options for the flink sql engine	string	1.6.0
`kyuubi.engine.flink.memory`	1g	The heap memory for the flink sql engine	string	1.6.0
`kyuubi.engine.hive.extra.classpath`	<undefined>	The extra classpath for the hive query engine, for configuring location of hadoop client jars, etc	string	1.6.0
`kyuubi.engine.hive.java.options`	<undefined>	The extra java options for the hive query engine	string	1.6.0
`kyuubi.engine.hive.memory`	1g	The heap memory for the hive query engine	string	1.6.0
`kyuubi.engine.initialize.sql`	SHOW DATABASES	SemiColon-separated list of SQL statements to be initialized in the newly created engine before queries. i.e. use `SHOW DATABASES` to eagerly active HiveClient. This configuration can not be used in JDBC url due to the limitation of Beeline/JDBC driver.	seq	1.2.0
`kyuubi.engine.operation.log.dir.root`	engine_operation_logs	Root directory for query operation log at engine-side.	string	1.4.0
`kyuubi.engine.pool.name`	engine-pool	The name of engine pool.	string	1.5.0
`kyuubi.engine.pool.size`	-1	The size of engine pool. Note that, if the size is less than 1, the engine pool will not be enabled; otherwise, the size of the engine pool will be min(this, kyuubi.engine.pool.size.threshold).	int	1.4.0
`kyuubi.engine.pool.size.threshold`	9	This parameter is introduced as a server-side parameter, and controls the upper limit of the engine pool.	int	1.4.0
`kyuubi.engine.security.crypto.cipher`	AES/CBC/PKCS5PADDING	The cipher transformation to use for encrypting engine access token.	string	1.5.0
`kyuubi.engine.security.crypto.ivLength`	16	Initial vector length, in bytes.	int	1.5.0
`kyuubi.engine.security.crypto.keyAlgorithm`	AES	The algorithm for generated secret keys.	string	1.5.0
`kyuubi.engine.security.crypto.keyLength`	128	The length in bits of the encryption key to generate. Valid values are 128, 192 and 256	int	1.5.0
`kyuubi.engine.security.enabled`	false	Whether to enable the internal secure access between Kyuubi server and engine.	boolean	1.5.0
`kyuubi.engine.security.secret.provider`	org.apache.kyuubi.service.authentication.ZooKeeperEngineSecuritySecretProviderImpl	The class used to manage the engine security secret. This class must be a subclass of EngineSecuritySecretProvider.	string	1.5.0
`kyuubi.engine.security.token.max.lifetime`	PT10M	The max lifetime of the token used for secure access between Kyuubi server and engine.	duration	1.5.0
`kyuubi.engine.session.initialize.sql`		SemiColon-separated list of SQL statements to be initialized in the newly created engine session before queries. This configuration can not be used in JDBC url due to the limitation of Beeline/JDBC driver.	seq	1.3.0
`kyuubi.engine.share.level`	USER	Engines will be shared in different levels, available configs are: CONNECTION: engine will not be shared but only used by the current client connection USER: engine will be shared by all sessions created by a unique username, see also kyuubi.engine.share.level.subdomain GROUP: engine will be shared by all sessions created by all users belong to the same primary group name. The engine will be launched by the group name as the effective username, so here the group name is kind of special user who is able to visit the compute resources/data of a team. It follows the Hadoop GroupsMapping to map user to a primary group. If the primary group is not found, it fallback to the USER level. SERVER: the App will be shared by Kyuubi servers	string	1.2.0
`kyuubi.engine.share.level.sub.domain`	<undefined>	(deprecated) - Using kyuubi.engine.share.level.subdomain instead	string	1.2.0
`kyuubi.engine.share.level.subdomain`	<undefined>	Allow end-users to create a subdomain for the share level of an engine. A subdomain is a case-insensitive string values that must be a valid zookeeper sub path. For example, for `USER` share level, an end-user can share a certain engine within a subdomain, not for all of its clients. End-users are free to create multiple engines in the `USER` share level. When disable engine pool, use 'default' if absent.	string	1.4.0
`kyuubi.engine.single.spark.session`	false	When set to true, this engine is running in a single session mode. All the JDBC/ODBC connections share the temporary views, function registries, SQL configuration and the current database.	boolean	1.3.0
`kyuubi.engine.trino.extra.classpath`	<undefined>	The extra classpath for the trino query engine, for configuring other libs which may need by the trino engine	string	1.6.0
`kyuubi.engine.trino.java.options`	<undefined>	The extra java options for the trino query engine	string	1.6.0
`kyuubi.engine.trino.memory`	1g	The heap memory for the trino query engine	string	1.6.0
`kyuubi.engine.type`	SPARK_SQL	Specify the detailed engine that supported by the Kyuubi. The engine type bindings to SESSION scope. This configuration is experimental. Currently, available configs are: SPARK_SQL: specify this engine type will launch a Spark engine which can provide all the capacity of the Apache Spark. Note, it's a default engine type. FLINK_SQL: specify this engine type will launch a Flink engine which can provide all the capacity of the Apache Flink. TRINO: specify this engine type will launch a Trino engine which can provide all the capacity of the Trino.	string	1.4.0
`kyuubi.engine.ui.retainedSessions`	200	The number of SQL client sessions kept in the Kyuubi Query Engine web UI.	int	1.4.0
`kyuubi.engine.ui.retainedStatements`	200	The number of statements kept in the Kyuubi Query Engine web UI.	int	1.4.0
`kyuubi.engine.ui.stop.enabled`	true	When true, allows Kyuubi engine to be killed from the Spark Web UI.	boolean	1.3.0
`kyuubi.engine.user.isolated.spark.session`	true	When set to false, if the engine is running in a group or server share level, all the JDBC/ODBC connections will be isolated against the user. Including: the temporary views, function registries, SQL configuration and the current database. Note that, it does not affect if the share level is connection or user.	boolean	1.6.0
`kyuubi.engine.user.isolated.spark.session.idle.interval`	PT1M	The interval to check if the user isolated spark session is timeout.	duration	1.6.0
`kyuubi.engine.user.isolated.spark.session.idle.timeout`	PT6H	If kyuubi.engine.user.isolated.spark.session is false, we will release the spark session if its corresponding user is inactive after this configured timeout.	duration	1.6.0

Frontend

Key	Default	Meaning	Type	Since
`kyuubi.frontend.backoff.slot.length`	PT0.1S	(deprecated) Time to back off during login to the thrift frontend service.	duration	1.0.0
`kyuubi.frontend.bind.host`	<undefined>	(deprecated) Hostname or IP of the machine on which to run the thrift frontend service via binary protocol.	string	1.0.0
`kyuubi.frontend.bind.port`	10009	(deprecated) Port of the machine on which to run the thrift frontend service via binary protocol.	int	1.0.0
`kyuubi.frontend.connection.url.use.hostname`	true	When true, frontend services prefer hostname, otherwise, ip address	boolean	1.5.0
`kyuubi.frontend.login.timeout`	PT20S	(deprecated) Timeout for Thrift clients during login to the thrift frontend service.	duration	1.0.0
`kyuubi.frontend.max.message.size`	104857600	(deprecated) Maximum message size in bytes a Kyuubi server will accept.	int	1.0.0
`kyuubi.frontend.max.worker.threads`	999	(deprecated) Maximum number of threads in the of frontend worker thread pool for the thrift frontend service	int	1.0.0
`kyuubi.frontend.min.worker.threads`	9	(deprecated) Minimum number of threads in the of frontend worker thread pool for the thrift frontend service	int	1.0.0
`kyuubi.frontend.mysql.bind.host`	<undefined>	Hostname or IP of the machine on which to run the MySQL frontend service.	string	1.4.0
`kyuubi.frontend.mysql.bind.port`	3309	Port of the machine on which to run the MySQL frontend service.	int	1.4.0
`kyuubi.frontend.mysql.max.worker.threads`	999	Maximum number of threads in the command execution thread pool for the MySQL frontend service	int	1.4.0
`kyuubi.frontend.mysql.min.worker.threads`	9	Minimum number of threads in the command execution thread pool for the MySQL frontend service	int	1.4.0
`kyuubi.frontend.mysql.netty.worker.threads`	<undefined>	Number of thread in the netty worker event loop of MySQL frontend service. Use min(cpu_cores, 8) in default.	int	1.4.0
`kyuubi.frontend.mysql.worker.keepalive.time`	PT1M	Time(ms) that an idle async thread of the command execution thread pool will wait for a new task to arrive before terminating in MySQL frontend service	duration	1.4.0
`kyuubi.frontend.protocols`	THRIFT_BINARY	A comma separated list for all frontend protocols THRIFT_BINARY - HiveServer2 compatible thrift binary protocol. REST - Kyuubi defined REST API(experimental). MYSQL - MySQL compatible text protocol(experimental).	seq	1.4.0
`kyuubi.frontend.rest.bind.host`	<undefined>	Hostname or IP of the machine on which to run the REST frontend service.	string	1.4.0
`kyuubi.frontend.rest.bind.port`	10099	Port of the machine on which to run the REST frontend service.	int	1.4.0
`kyuubi.frontend.thrift.backoff.slot.length`	PT0.1S	Time to back off during login to the thrift frontend service.	duration	1.4.0
`kyuubi.frontend.thrift.binary.bind.host`	<undefined>	Hostname or IP of the machine on which to run the thrift frontend service via binary protocol.	string	1.4.0
`kyuubi.frontend.thrift.binary.bind.port`	10009	Port of the machine on which to run the thrift frontend service via binary protocol.	int	1.4.0
`kyuubi.frontend.thrift.login.timeout`	PT20S	Timeout for Thrift clients during login to the thrift frontend service.	duration	1.4.0
`kyuubi.frontend.thrift.max.message.size`	104857600	Maximum message size in bytes a Kyuubi server will accept.	int	1.4.0
`kyuubi.frontend.thrift.max.worker.threads`	999	Maximum number of threads in the of frontend worker thread pool for the thrift frontend service	int	1.4.0
`kyuubi.frontend.thrift.min.worker.threads`	9	Minimum number of threads in the of frontend worker thread pool for the thrift frontend service	int	1.4.0
`kyuubi.frontend.thrift.worker.keepalive.time`	PT1M	Keep-alive time (in milliseconds) for an idle worker thread	duration	1.4.0
`kyuubi.frontend.worker.keepalive.time`	PT1M	(deprecated) Keep-alive time (in milliseconds) for an idle worker thread	duration	1.0.0

Ha

Key	Default	Meaning	Type	Since
`kyuubi.ha.zookeeper.acl.enabled`	false	Set to true if the zookeeper ensemble is kerberized	boolean	1.0.0
`kyuubi.ha.zookeeper.auth.digest`	<undefined>	The digest auth string is used for zookeeper authentication, like: username:password.	string	1.3.2
`kyuubi.ha.zookeeper.auth.keytab`	<undefined>	Location of Kyuubi server's keytab is used for zookeeper authentication.	string	1.3.2
`kyuubi.ha.zookeeper.auth.principal`	<undefined>	Name of the Kerberos principal is used for zookeeper authentication.	string	1.3.2
`kyuubi.ha.zookeeper.auth.type`	NONE	The type of zookeeper authentication, all candidates are NONE KERBEROS DIGEST	string	1.3.2
`kyuubi.ha.zookeeper.connection.base.retry.wait`	1000	Initial amount of time to wait between retries to the zookeeper ensemble	int	1.0.0
`kyuubi.ha.zookeeper.connection.max.retries`	3	Max retry times for connecting to the zookeeper ensemble	int	1.0.0
`kyuubi.ha.zookeeper.connection.max.retry.wait`	30000	Max amount of time to wait between retries for BOUNDED_EXPONENTIAL_BACKOFF policy can reach, or max time until elapsed for UNTIL_ELAPSED policy to connect the zookeeper ensemble	int	1.0.0
`kyuubi.ha.zookeeper.connection.retry.policy`	EXPONENTIAL_BACKOFF	The retry policy for connecting to the zookeeper ensemble, all candidates are: ONE_TIME N_TIME EXPONENTIAL_BACKOFF BOUNDED_EXPONENTIAL_BACKOFF UNTIL_ELAPSED	string	1.0.0
`kyuubi.ha.zookeeper.connection.timeout`	15000	The timeout(ms) of creating the connection to the zookeeper ensemble	int	1.0.0
`kyuubi.ha.zookeeper.engine.auth.type`	NONE	The type of zookeeper authentication for engine, all candidates are NONE KERBEROS DIGEST	string	1.3.2
`kyuubi.ha.zookeeper.engine.secure.secret.node`	<undefined>	The zk node contains the secret that used for internal secure between Kyuubi server and Kyuubi engine, please make sure that it is only visible for Kyuubi.	string	1.5.0
`kyuubi.ha.zookeeper.namespace`	kyuubi	The root directory for the service to deploy its instance uri	string	1.0.0
`kyuubi.ha.zookeeper.node.creation.timeout`	PT2M	Timeout for creating zookeeper node	duration	1.2.0
`kyuubi.ha.zookeeper.publish.configs`	false	When set to true, publish Kerberos configs to Zookeeper.Note that the Hive driver needs to be greater than 1.3 or 2.0 or apply HIVE-11581 patch.	boolean	1.4.0
`kyuubi.ha.zookeeper.quorum`		The connection string for the zookeeper ensemble	string	1.0.0
`kyuubi.ha.zookeeper.session.timeout`	60000	The timeout(ms) of a connected session to be idled	int	1.0.0

Kinit

Key	Default	Meaning	Type	Since
`kyuubi.kinit.interval`	PT1H	How often will Kyuubi server run `kinit -kt [keytab] [principal]` to renew the local Kerberos credentials cache	duration	1.0.0
`kyuubi.kinit.keytab`	<undefined>	Location of Kyuubi server's keytab.	string	1.0.0
`kyuubi.kinit.max.attempts`	10	How many times will `kinit` process retry	int	1.0.0
`kyuubi.kinit.principal`	<undefined>	Name of the Kerberos principal.	string	1.0.0

Metrics

Key	Default	Meaning	Type	Since
`kyuubi.metrics.console.interval`	PT5S	How often should report metrics to console	duration	1.2.0
`kyuubi.metrics.enabled`	true	Set to true to enable kyuubi metrics system	boolean	1.2.0
`kyuubi.metrics.json.interval`	PT5S	How often should report metrics to json file	duration	1.2.0
`kyuubi.metrics.json.location`	metrics	Where the json metrics file located	string	1.2.0
`kyuubi.metrics.prometheus.path`	/metrics	URI context path of prometheus metrics HTTP server	string	1.2.0
`kyuubi.metrics.prometheus.port`	10019	Prometheus metrics HTTP server port	int	1.2.0
`kyuubi.metrics.reporters`	JSON	A comma separated list for all metrics reporters CONSOLE - ConsoleReporter which outputs measurements to CONSOLE periodically. JMX - JmxReporter which listens for new metrics and exposes them as MBeans. JSON - JsonReporter which outputs measurements to json file periodically. PROMETHEUS - PrometheusReporter which exposes metrics in prometheus format. SLF4J - Slf4jReporter which outputs measurements to system log periodically.	seq	1.2.0
`kyuubi.metrics.slf4j.interval`	PT5S	How often should report metrics to SLF4J logger	duration	1.2.0

Operation

Key	Default	Meaning	Type	Since
`kyuubi.operation.idle.timeout`	PT3H	Operation will be closed when it's not accessed for this duration of time	duration	1.0.0
`kyuubi.operation.interrupt.on.cancel`	true	When true, all running tasks will be interrupted if one cancels a query. When false, all running tasks will remain until finished.	boolean	1.2.0
`kyuubi.operation.language`	SQL	Choose a programing language for the following inputs SQL: (Default) Run all following statements as SQL queries. SCALA: Run all following input a scala codes	string	1.5.0
`kyuubi.operation.log.dir.root`	server_operation_logs	Root directory for query operation log at server-side.	string	1.4.0
`kyuubi.operation.plan.only.excludes`	ResetCommand,SetCommand,SetNamespaceCommand,UseStatement	Comma-separated list of query plan names, in the form of simple class names, i.e, for `set abc=xyz`, the value will be `SetCommand`. For those auxiliary plans, such as `switch databases`, `set properties`, or `create temporary view` e.t.c, which are used for setup evaluating environments for analyzing actual queries, we can use this config to exclude them and let them take effect. See also kyuubi.operation.plan.only.mode.	seq	1.5.0
`kyuubi.operation.plan.only.mode`	NONE	Whether to perform the statement in a PARSE, ANALYZE, OPTIMIZE, PHYSICAL, EXECUTION only way without executing the query. When it is NONE, the statement will be fully executed	string	1.4.0
`kyuubi.operation.progress.enabled`	false	Whether to enable the operation progress. When true, the operation progress will be returned in `GetOperationStatus`.	boolean	1.6.0
`kyuubi.operation.query.timeout`	<undefined>	Timeout for query executions at server-side, take affect with client-side timeout(`java.sql.Statement.setQueryTimeout`) together, a running query will be cancelled automatically if timeout. It's off by default, which means only client-side take fully control whether the query should timeout or not. If set, client-side timeout capped at this point. To cancel the queries right away without waiting task to finish, consider enabling kyuubi.operation.interrupt.on.cancel together.	duration	1.2.0
`kyuubi.operation.result.max.rows`	0	Max rows of Spark query results. Rows that exceeds the limit would be ignored. By setting this value to 0 to disable the max rows limit.	int	1.6.0
`kyuubi.operation.scheduler.pool`	<undefined>	The scheduler pool of job. Note that, this config should be used after change Spark config spark.scheduler.mode=FAIR.	string	1.1.1
`kyuubi.operation.status.polling.max.attempts`	5	(deprecated) - Using kyuubi.operation.thrift.client.request.max.attempts instead	int	1.4.0
`kyuubi.operation.status.polling.timeout`	PT5S	Timeout(ms) for long polling asynchronous running sql query's status	duration	1.0.0
`kyuubi.operation.thrift.client.request.max.attempts`	5	Max attempts for operation thrift request call at server-side on raw transport failures, e.g. TTransportException	int	1.6.0

Server

Key	Default	Meaning	Type	Since
`kyuubi.server.limit.connections.per.ipaddress`	<undefined>	Maximum kyuubi server connections per ipaddress. Any user exceeding this limit will not be allowed to connect.	int	1.6.0
`kyuubi.server.limit.connections.per.user`	<undefined>	Maximum kyuubi server connections per user. Any user exceeding this limit will not be allowed to connect.	int	1.6.0
`kyuubi.server.limit.connections.per.user.ipaddress`	<undefined>	Maximum kyuubi server connections per user:ipaddress combination. Any user-ipaddress exceeding this limit will not be allowed to connect.	int	1.6.0
`kyuubi.server.name`	<undefined>	The name of Kyuubi Server.	string	1.5.0

Session

Key	Default	Meaning	Type	Since
`kyuubi.session.check.interval`	PT5M	The check interval for session timeout.	duration	1.0.0
`kyuubi.session.conf.advisor`	<undefined>	A config advisor plugin for Kyuubi Server. This plugin can provide some custom configs for different user or session configs and overwrite the session configs before open a new session. This config value should be a class which is a child of 'org.apache.kyuubi.plugin.SessionConfAdvisor' which has zero-arg constructor.	string	1.5.0
`kyuubi.session.conf.ignore.list`		A comma separated list of ignored keys. If the client connection contains any of them, the key and the corresponding value will be removed silently during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax.	seq	1.2.0
`kyuubi.session.conf.restrict.list`		A comma separated list of restricted keys. If the client connection contains any of them, the connection will be rejected explicitly during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax.	seq	1.2.0
`kyuubi.session.engine.alive.probe.enabled`	false	Whether to enable the engine alive probe, it true, we will create a companion thrift client that sends simple request to check whether the engine is keep alive.	boolean	1.6.0
`kyuubi.session.engine.alive.probe.interval`	PT10S	The interval for engine alive probe.	duration	1.6.0
`kyuubi.session.engine.alive.timeout`	PT2M	The timeout for engine alive. If there is no alive probe success in the last timeout window, the engine will be marked as no-alive.	duration	1.6.0
`kyuubi.session.engine.check.interval`	PT1M	The check interval for engine timeout	duration	1.0.0
`kyuubi.session.engine.flink.main.resource`	<undefined>	The package used to create Flink SQL engine remote job. If it is undefined, Kyuubi will use the default	string	1.4.0
`kyuubi.session.engine.flink.max.rows`	1000000	Max rows of Flink query results. For batch queries, rows that exceeds the limit would be ignored. For streaming queries, the query would be canceled if the limit is reached.	int	1.5.0
`kyuubi.session.engine.hive.main.resource`	<undefined>	The package used to create Hive engine remote job. If it is undefined, Kyuubi will use the default	string	1.6.0
`kyuubi.session.engine.idle.timeout`	PT30M	engine timeout, the engine will self-terminate when it's not accessed for this duration. 0 or negative means not to self-terminate.	duration	1.0.0
`kyuubi.session.engine.initialize.timeout`	PT3M	Timeout for starting the background engine, e.g. SparkSQLEngine.	duration	1.0.0
`kyuubi.session.engine.launch.async`	true	When opening kyuubi session, whether to launch backend engine asynchronously. When true, the Kyuubi server will set up the connection with the client without delay as the backend engine will be created asynchronously.	boolean	1.4.0
`kyuubi.session.engine.log.timeout`	PT24H	If we use Spark as the engine then the session submit log is the console output of spark-submit. We will retain the session submit log until over the config value.	duration	1.1.0
`kyuubi.session.engine.login.timeout`	PT15S	The timeout of creating the connection to remote sql query engine	duration	1.0.0
`kyuubi.session.engine.request.timeout`	PT0S	The timeout of awaiting response after sending request to remote sql query engine	duration	1.4.0
`kyuubi.session.engine.share.level`	USER	(deprecated) - Using kyuubi.engine.share.level instead	string	1.0.0
`kyuubi.session.engine.spark.main.resource`	<undefined>	The package used to create Spark SQL engine remote application. If it is undefined, Kyuubi will use the default	string	1.0.0
`kyuubi.session.engine.spark.max.lifetime`	PT0S	Max lifetime for spark engine, the engine will self-terminate when it reaches the end of life. 0 or negative means not to self-terminate.	duration	1.6.0
`kyuubi.session.engine.spark.progress.timeFormat`	yyyy-MM-dd HH:mm:ss.SSS	The time format of the progress bar	string	1.6.0
`kyuubi.session.engine.spark.progress.update.interval`	PT1S	Update period of progress bar.	duration	1.6.0
`kyuubi.session.engine.spark.showProgress`	false	When true, show the progress bar in the spark engine log.	boolean	1.6.0
`kyuubi.session.engine.startup.error.max.size`	8192	During engine bootstrapping, if error occurs, using this config to limit the length error message(characters).	int	1.1.0
`kyuubi.session.engine.startup.maxLogLines`	10	The maximum number of engine log lines when errors occur during engine startup phase. Note that this max lines is for client-side to help track engine startup issue.	int	1.4.0
`kyuubi.session.engine.startup.waitCompletion`	true	Whether to wait for completion after engine starts. If false, the startup process will be destroyed after the engine is started. Note that only use it when the driver is not running locally, such as yarn-cluster mode; Otherwise, the engine will be killed.	boolean	1.5.0
`kyuubi.session.engine.trino.connection.catalog`	<undefined>	The default catalog that trino engine will connect to	string	1.5.0
`kyuubi.session.engine.trino.connection.url`	<undefined>	The server url that trino engine will connect to	string	1.5.0
`kyuubi.session.engine.trino.main.resource`	<undefined>	The package used to create Trino engine remote job. If it is undefined, Kyuubi will use the default	string	1.5.0
`kyuubi.session.idle.timeout`	PT6H	session idle timeout, it will be closed when it's not accessed for this duration	duration	1.2.0
`kyuubi.session.name`	<undefined>	A human readable name of session and we use empty string by default. This name will be recorded in event. Note that, we only apply this value from session conf.	string	1.4.0
`kyuubi.session.timeout`	PT6H	(deprecated)session timeout, it will be closed when it's not accessed for this duration	duration	1.0.0

Spnego

Key	Default	Meaning	Type	Since
`kyuubi.spnego.keytab`	<undefined>	Keytab file for SPNego principal	string	1.6.0
`kyuubi.spnego.principal`	<undefined>	SPNego service principal, typical value would look like HTTP/_HOST@EXAMPLE.COM. SPNego service principal would be used when restful Kerberos security is enabled. This needs to be set only if SPNEGO is to be used in authentication.	string	1.6.0

Zookeeper

Key	Default	Meaning	Type	Since
`kyuubi.zookeeper.embedded.client.port`	2181	clientPort for the embedded zookeeper server to listen for client connections, a client here could be Kyuubi server, engine and JDBC client	int	1.2.0
`kyuubi.zookeeper.embedded.client.port.address`	<undefined>	clientPortAddress for the embedded zookeeper server to	string	1.2.0
`kyuubi.zookeeper.embedded.data.dir`	embedded_zookeeper	dataDir for the embedded zookeeper server where stores the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.	string	1.2.0
`kyuubi.zookeeper.embedded.data.log.dir`	embedded_zookeeper	dataLogDir for the embedded zookeeper server where writes the transaction log .	string	1.2.0
`kyuubi.zookeeper.embedded.directory`	embedded_zookeeper	The temporary directory for the embedded zookeeper server	string	1.0.0
`kyuubi.zookeeper.embedded.max.client.connections`	120	maxClientCnxns for the embedded zookeeper server to limits the number of concurrent connections of a single client identified by IP address	int	1.2.0
`kyuubi.zookeeper.embedded.max.session.timeout`	60000	maxSessionTimeout in milliseconds for the embedded zookeeper server will allow the client to negotiate. Defaults to 20 times the tickTime	int	1.2.0
`kyuubi.zookeeper.embedded.min.session.timeout`	6000	minSessionTimeout in milliseconds for the embedded zookeeper server will allow the client to negotiate. Defaults to 2 times the tickTime	int	1.2.0
`kyuubi.zookeeper.embedded.port`	2181	The port of the embedded zookeeper server	int	1.0.0
`kyuubi.zookeeper.embedded.tick.time`	3000	tickTime in milliseconds for the embedded zookeeper server	int	1.2.0

Spark Configurations

Via spark-defaults.conf

Setting them in $SPARK_HOME/conf/spark-defaults.conf supplies with default values for SQL engine application. Available properties can be found at Spark official online documentation for Spark Configurations

Via kyuubi-defaults.conf

Setting them in $KYUUBI_HOME/conf/kyuubi-defaults.conf supplies with default values for SQL engine application too. These properties will override all settings in $SPARK_HOME/conf/spark-defaults.conf

Via JDBC Connection URL

Setting them in the JDBC Connection URL supplies session-specific for each SQL engine. For example: jdbc:hive2://localhost:10009/default;#spark.sql.shuffle.partitions=2;spark.executor.memory=5g

Runtime SQL Configuration
- For Runtime SQL Configurations, they will take affect every time
Static SQL and Spark Core Configuration
- For Static SQL Configurations and other spark core configs, e.g. spark.executor.memory, they will take affect if there is no existing SQL engine application. Otherwise, they will just be ignored

Via SET Syntax

Please refer to the Spark official online documentation for SET Command

Flink Configurations

Via flink-conf.yaml

Setting them in $FLINK_HOME/conf/flink-conf.yaml supplies with default values for SQL engine application. Available properties can be found at Flink official online documentation for Flink Configurations

Via kyuubi-defaults.conf

Setting them in $KYUUBI_HOME/conf/kyuubi-defaults.conf supplies with default values for SQL engine application too. You can use properties with the additional prefix flink. to override settings in $FLINK_HOME/conf/flink-conf.yaml.

For example:

flink.parallelism.default 2
flink.taskmanager.memory.process.size 5g

The below options in kyuubi-defaults.conf will set parallelism.default: 2 and taskmanager.memory.process.size: 5g into flink configurations.

Via JDBC Connection URL

Setting them in the JDBC Connection URL supplies session-specific for each SQL engine. For example: jdbc:hive2://localhost:10009/default;#parallelism.default=2;taskmanager.memory.process.size=5g

Via SET Statements

Please refer to the Flink official online documentation for SET Statements

Logging

Kyuubi uses log4j for logging. You can configure it using $KYUUBI_HOME/conf/log4j2.properties.

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the file
rootLogger.level = info
rootLogger.appenderRef.stdout.ref = STDOUT

# Console Appender
appender.console.type = Console
appender.console.name = STDOUT
appender.console.target = SYSTEM_OUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{HH:mm:ss.SSS} %p %c: %m%n

appender.console.filter.1.type = Filters

appender.console.filter.1.a.type = ThresholdFilter
appender.console.filter.1.a.level = info

# SPARK-34128: Suppress undesirable TTransportException warnings, due to THRIFT-4805
appender.console.filter.1.b.type = RegexFilter
appender.console.filter.1.b.regex = .*Thrift error occurred during processing of message.*
appender.console.filter.1.b.onMatch = deny
appender.console.filter.1.b.onMismatch = neutral

# Set the default kyuubi-ctl log level to WARN. When running the kyuubi-ctl, the
# log level for this class is used to overwrite the root logger's log level.
logger.ctl.name = org.apache.kyuubi.ctl.ServiceControlCli
logger.ctl.level = error

# Analysis MySQLFrontend protocol traffic
# logger.mysql.name = org.apache.kyuubi.server.mysql.codec
# logger.mysql.level = trace

# Kyuubi BeeLine
logger.beeline.name = org.apache.hive.beeline.KyuubiBeeLine
logger.beeline.level = error

Other Configurations

Hadoop Configurations

Specifying HADOOP_CONF_DIR to the directory contains hadoop configuration files or treating them as Spark properties with a spark.hadoop. prefix. Please refer to the Spark official online documentation for Inheriting Hadoop Cluster Configuration. Also, please refer to the Apache Hadoop's online documentation for an overview on how to configure Hadoop.

Hive Configurations

These configurations are used for SQL engine application to talk to Hive MetaStore and could be configured in a hive-site.xml. Placed it in $SPARK_HOME/conf directory, or treating them as Spark properties with a spark.hadoop. prefix.

User Defaults

In Kyuubi, we can configure user default settings to meet separate needs. These user defaults override system defaults, but will be overridden by those from JDBC Connection URL or Set Command if could be. They will take effect when creating the SQL engine application ONLY. User default settings are in the form of ___{username}___.{config key}. There are three continuous underscores(_) at both sides of the username and a dot(.) that separates the config key and the prefix. For example:

# For system defaults
spark.master=local
spark.sql.adaptive.enabled=true
# For a user named kent
___kent___.spark.master=yarn
___kent___.spark.sql.adaptive.enabled=false
# For a user named bob
___bob___.spark.master=spark://master:7077
___bob___.spark.executor.memory=8g

In the above case, if there are related configurations from JDBC Connection URL, kent will run his SQL engine application on YARN and prefer the Spark AQE to be off, while bob will activate his SQL engine application on a Spark standalone cluster with 8g heap memory for each executor and obey the Spark AQE behavior of Kyuubi system default. On the other hand, for those users who do not have custom configurations will use system defaults.

89 KiB Raw Blame History

Introduction to the Kyuubi Configurations System

Environments

Kyuubi Configurations

Authentication

Backend

Batch

Credentials

Delegation

Engine

Frontend

Ha

Kinit

Metrics

Operation

Server

Session

Spnego

Zookeeper

Spark Configurations

Via spark-defaults.conf

Via kyuubi-defaults.conf

Via JDBC Connection URL

Via SET Syntax

Flink Configurations

Via flink-conf.yaml

Via kyuubi-defaults.conf

Via JDBC Connection URL

Via SET Statements

Logging

Other Configurations

Hadoop Configurations

Hive Configurations

User Defaults

89 KiB

Raw Blame History