 [](https://github.com/yaooqinn/kyuubi/pull/451)       [<img width="16" alt="Powered by Pull Request Badge" src="https://user-images.githubusercontent.com/1393946/111216524-d2bb8e00-85d4-11eb-821b-ed4c00989c02.png">](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT --> <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/yaooqinn/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Manual cherry-pick some Spark patch into Kyuubi. 1. [Support query auto timeout cancel on thriftserver](https://github.com/apache/spark/pull/29933) 2. [Add config to control if cancel invoke interrupt task on thriftserver](https://github.com/apache/spark/pull/30481) In order to keep backward with early Spark version, we hard code the config key instead of refer to Spark SQLConf. Note that, the exists timeout of operator (`kyuubi.operation.idle.timeout`) is to cancel that client has no access with engine. That said if a query run a long time and the client is alive, the query would not be cancelled. Then the new added config `spark.sql.thriftServer.queryTimeout` can handle this case. ### _How was this patch tested?_ Add new test. Closes #451 from ulysses-you/query-timeout. 212f579 [ulysses-you] docs 9206538 [ulysses-you] empty flaky test ddab9bf [ulysses-you] flaty test 1da02a0 [ulysses-you] flaty test edfadf1 [ulysses-you] nit 3f9920b [ulysses-you] address comment 9492c48 [ulysses-you] correct timeout 5df997e [ulysses-you] nit 2124952 [ulysses-you] address comment 192fdcc [ulysses-you] fix tets d684af6 [ulysses-you] global config 1d1adda [ulysses-you] empty 967a63e [ulysses-you] correct import 128948e [ulysses-you] add session conf in session 144d51b [ulysses-you] fix a90248b [ulysses-you] unused import c90386f [ulysses-you] timeout move to operation manager d780965 [ulysses-you] update docs a5f7138 [ulysses-you] fix test f7c7308 [ulysses-you] config name 7f3fb3d [ulysses-you] change conf place 97a011e [ulysses-you] unnecessary change 0953a76 [ulysses-you] move test 38ac0c0 [ulysses-you] Merge branch 'master' of https://github.com/yaooqinn/kyuubi into query-timeout 71bea97 [ulysses-you] refector implementation 35ef6f9 [ulysses-you] update conf 0cad8e2 [ulysses-you] Support query auto timeout cancel on thriftserver Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>
29 KiB
Introduction to the Kyuubi Configurations System
Kyuubi provides several ways to configure the system and corresponding engines.
Environments
You can configure the environment variables in $KYUUBI_HOME/conf/kyuubi-env.sh, e.g, JAVA_HOME, then this java runtime will be used both for Kyuubi server instance and the applications it launches. You can also change the variable in the subprocess's env configuration file, e.g.$SPARK_HOME/conf/spark-env.sh to use more specific ENV for SQL engine applications.
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# - JAVA_HOME Java runtime to use. By default use "java" from PATH.
#
#
# - KYUUBI_CONF_DIR Directory containing the Kyuubi configurations to use.
# (Default: $KYUUBI_HOME/conf)
# - KYUUBI_LOG_DIR Directory for Kyuubi server-side logs.
# (Default: $KYUUBI_HOME/logs)
# - KYUUBI_PID_DIR Directory stores the Kyuubi instance pid file.
# (Default: $KYUUBI_HOME/pid)
# - KYUUBI_MAX_LOG_FILES Maximum number of Kyuubi server logs can rotate to.
# (Default: 5)
# - KYUUBI_JAVA_OPTS JVM options for the Kyuubi server itself in the form "-Dx=y".
# (Default: none).
# - KYUUBI_NICENESS The scheduling priority for Kyuubi server.
# (Default: 0)
# - KYUUBI_WORK_DIR_ROOT Root directory for launching sql engine applications.
# (Default: $KYUUBI_HOME/work)
# - HADOOP_CONF_DIR Directory containing the Hadoop / YARN configuration to use.
#
# - SPARK_HOME Spark distribution which you would like to use in Kyuubi.
# - SPARK_CONF_DIR Optional directory where the Spark configuration lives.
# (Default: $SPARK_HOME/conf)
#
## Examples ##
# export JAVA_HOME=/usr/jdk64/jdk1.8.0_152
# export HADOOP_CONF_DIR=/usr/ndp/current/mapreduce_client/conf
# export KYUUBI_JAVA_OPTS="-Xmx10g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"
Kyuubi Configurations
You can configure the Kyuubi properties in $KYUUBI_HOME/conf/kyuubi-defaults.conf. For example:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
## Kyuubi Configurations
#
# kyuubi.authentication NONE
# kyuubi.frontend.bind.host localhost
# kyuubi.frontend.bind.port 10009
#
# Details in https://kyuubi.readthedocs.io/en/latest/deployment/settings.html
Authentication
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.authentication | NONE |
Client authentication types.
|
1.0.0 |
| kyuubi.authentication .ldap.base.dn |
<undefined> |
LDAP base DN. |
1.0.0 |
| kyuubi.authentication .ldap.domain |
<undefined> |
LDAP domain. |
1.0.0 |
| kyuubi.authentication .ldap.url |
<undefined> |
SPACE character separated LDAP connection URL(s). |
1.0.0 |
| kyuubi.authentication .sasl.qop |
auth |
Sasl QOP enable higher levels of protection for Kyuubi communication with clients.
|
1.0.0 |
Backend
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.backend.engine .exec.pool.keepalive .time |
PT1M |
Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in SQL engine applications |
1.0.0 |
| kyuubi.backend.engine .exec.pool.shutdown .timeout |
PT10S |
Timeout(ms) for the operation execution thread pool to terminate in SQL engine applications |
1.0.0 |
| kyuubi.backend.engine .exec.pool.size |
100 |
Number of threads in the operation execution thread pool of SQL engine applications |
1.0.0 |
| kyuubi.backend.engine .exec.pool.wait.queue .size |
100 |
Size of the wait queue for the operation execution thread pool in SQL engine applications |
1.0.0 |
| kyuubi.backend.server .exec.pool.keepalive .time |
PT1M |
Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in Kyuubi server |
1.0.0 |
| kyuubi.backend.server .exec.pool.shutdown .timeout |
PT10S |
Timeout(ms) for the operation execution thread pool to terminate in Kyuubi server |
1.0.0 |
| kyuubi.backend.server .exec.pool.size |
100 |
Number of threads in the operation execution thread pool of Kyuubi server |
1.0.0 |
| kyuubi.backend.server .exec.pool.wait.queue .size |
100 |
Size of the wait queue for the operation execution thread pool of Kyuubi server |
1.0.0 |
Delegation
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.delegation.key .update.interval |
PT24H |
unused yet |
1.0.0 |
| kyuubi.delegation .token.gc.interval |
PT1H |
unused yet |
1.0.0 |
| kyuubi.delegation .token.max.lifetime |
PT168H |
unused yet |
1.0.0 |
| kyuubi.delegation .token.renew.interval |
PT168H |
unused yet |
1.0.0 |
Frontend
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.frontend .backoff.slot.length |
PT0.1S |
Time to back off during login to the frontend service. |
1.0.0 |
| kyuubi.frontend.bind .host |
<undefined> |
Hostname or IP of the machine on which to run the frontend service. |
1.0.0 |
| kyuubi.frontend.bind .port |
10009 |
Port of the machine on which to run the frontend service. |
1.0.0 |
| kyuubi.frontend.login .timeout |
PT20S |
Timeout for Thrift clients during login to the frontend service. |
1.0.0 |
| kyuubi.frontend.max .message.size |
104857600 |
Maximum message size in bytes a Kyuubi server will accept. |
1.0.0 |
| kyuubi.frontend.max .worker.threads |
999 |
Maximum number of threads in the of frontend worker thread pool for the frontend service |
1.0.0 |
| kyuubi.frontend.min .worker.threads |
9 |
Minimum number of threads in the of frontend worker thread pool for the frontend service |
1.0.0 |
| kyuubi.frontend .worker.keepalive.time |
PT1M |
Keep-alive time (in milliseconds) for an idle worker thread |
1.0.0 |
Ha
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.ha.zookeeper .acl.enabled |
false |
Set to true if the zookeeper ensemble is kerberized |
1.0.0 |
| kyuubi.ha.zookeeper .connection.base.retry .wait |
1000 |
Initial amount of time to wait between retries to the zookeeper ensemble |
1.0.0 |
| kyuubi.ha.zookeeper .connection.max .retries |
3 |
Max retry times for connecting to the zookeeper ensemble |
1.0.0 |
| kyuubi.ha.zookeeper .connection.max.retry .wait |
30000 |
Max amount of time to wait between retries for BOUNDED_EXPONENTIAL_BACKOFF policy can reach, or max time until elapsed for UNTIL_ELAPSED policy to connect the zookeeper ensemble |
1.0.0 |
| kyuubi.ha.zookeeper .connection.retry .policy |
EXPONENTIAL_BACKOFF |
The retry policy for connecting to the zookeeper ensemble, all candidates are:
|
1.0.0 |
| kyuubi.ha.zookeeper .connection.timeout |
15000 |
The timeout(ms) of creating the connection to the zookeeper ensemble |
1.0.0 |
| kyuubi.ha.zookeeper .namespace |
kyuubi |
The root directory for the service to deploy its instance uri. Additionally, it will creates a -[username] suffixed root directory for each application |
1.0.0 |
| kyuubi.ha.zookeeper .quorum |
The connection string for the zookeeper ensemble |
1.0.0 |
|
| kyuubi.ha.zookeeper .session.timeout |
60000 |
The timeout(ms) of a connected session to be idled |
1.0.0 |
Kinit
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.kinit.interval | PT1H |
How often will Kyuubi server run kinit -kt [keytab] [principal] to renew the local Kerberos credentials cache |
1.0.0 |
| kyuubi.kinit.keytab | <undefined> |
Location of Kyuubi server's keytab. |
1.0.0 |
| kyuubi.kinit.max .attempts |
10 |
How many times will kinit process retry |
1.0.0 |
| kyuubi.kinit .principal |
<undefined> |
Name of the Kerberos principal. |
1.0.0 |
Metrics
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.metrics .enabled |
true |
Set to true to enable kyuubi metrics system |
1.2.0 |
| kyuubi.metrics.json .report.location |
metrics |
Where the json metrics file located |
1.2.0 |
| kyuubi.metrics.report .interval |
PT5S |
How often should report metrics to json/console. no effect on JMX |
1.2.0 |
| kyuubi.metrics .reporters |
JSON |
A comma separated list for all metrics reporters
|
1.2.0 |
Operation
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.operation.idle .timeout |
PT3H |
Operation will be closed when it's not accessed for this duration of time |
1.0.0 |
| kyuubi.operation .interrupt.on.cancel |
true |
When true, all running tasks will be interrupted if one cancels a query. When false, all running tasks will remain until finished. |
1.2.0 |
| kyuubi.operation .query.timeout |
PT0S |
Set a query duration timeout in seconds in Kyuubi. If the timeout is set to a positive value, a running query will be cancelled automatically if timeout. Otherwise the query continues to run till completion. If timeout values are set for each statement via java.sql.Statement.setQueryTimeout and they are smaller than this configuration value, they take precedence. If you set this timeout and prefer to cancel the queries right away without waiting task to finish, consider enabling kyuubi.operation.interrupt.on.cancel together. |
1.2.0 |
| kyuubi.operation .status.polling .timeout |
PT5S |
Timeout(ms) for long polling asynchronous running sql query's status |
1.0.0 |
Session
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.session.check .interval |
PT5M |
The check interval for session timeout. |
1.0.0 |
| kyuubi.session.engine .check.interval |
PT5M |
The check interval for engine timeout |
1.0.0 |
| kyuubi.session.engine .idle.timeout |
PT30M |
engine timeout, the engine will self-terminate when it's not accessed for this duration |
1.0.0 |
| kyuubi.session.engine .initialize.timeout |
PT1M |
Timeout for starting the background engine, e.g. SparkSQLEngine. |
1.0.0 |
| kyuubi.session.engine .log.timeout |
PT24H |
If we use Spark as the engine then the session submit log is the console output of spark-submit. We will retain the session submit log until over the config value. |
1.1.0 |
| kyuubi.session.engine .login.timeout |
PT15S |
The timeout(ms) of creating the connection to remote sql query engine |
1.0.0 |
| kyuubi.session.engine .share.level |
USER |
The SQL engine App will be shared in different levels, available configs are:
|
1.0.0 |
| kyuubi.session.engine .spark.main.resource |
<undefined> |
The package used to create Spark SQL engine remote application. If it is undefined, Kyuubi will use the default |
1.0.0 |
| kyuubi.session.engine .startup.error.max .size |
8192 |
During engine bootstrapping, if error occurs, using this config to limit the length error message(characters). |
1.1.0 |
| kyuubi.session .timeout |
PT6H |
session timeout, it will be closed when it's not accessed for this duration |
1.0.0 |
Zookeeper
| Key | Default | Meaning | Since |
|---|---|---|---|
| kyuubi.zookeeper .embedded.directory |
embedded_zookeeper |
The temporary directory for the embedded zookeeper server |
1.0.0 |
| kyuubi.zookeeper .embedded.port |
2181 |
The port of the embedded zookeeper server |
1.0.0 |
Spark Configurations
Via spark-defaults.conf
Setting them in $SPARK_HOME/conf/spark-defaults.conf supplies with default values for SQL engine application. Available properties can be found at Spark official online documentation for Spark Configurations
Via kyuubi-defaults.conf
Setting them in $KYUUBI_HOME/conf/kyuubi-defaults.conf supplies with default values for SQL engine application too. These properties will override all settings in $SPARK_HOME/conf/spark-defaults.conf
Via JDBC Connection URL
Setting them in the JDBC Connection URL supplies session-specific for each SQL engine. For example: jdbc:hive2://localhost:10009/default;#spark.sql.shuffle.partitions=2;spark.executor.memory=5g
- Runtime SQL Configuration
- For Runtime SQL Configurations, they will take affect every time
- Static SQL and Spark Core Configuration
- For Static SQL Configurations and other spark core configs, e.g.
spark.executor.memory, they will take affect if there is no existing SQL engine application. Otherwise, they will just be ignored
- For Static SQL Configurations and other spark core configs, e.g.
Via SET Syntax
Please refer to the Spark official online documentation for SET Command
Logging
Kyuubi uses log4j for logging. You can configure it using $KYUUBI_HOME/conf/log4j.properties.
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} %p %c{2}: %m%n
Other Configurations
Hadoop Configurations
Specifying HADOOP_CONF_DIR to the directory contains hadoop configuration files or treating them as Spark properties with a spark.hadoop. prefix. Please refer to the Spark official online documentation for Inheriting Hadoop Cluster Configuration. Also, please refer to the Apache Hadoop's online documentation for an overview on how to configure Hadoop.
Hive Configurations
These configurations are used for SQL engine application to talk to Hive MetaStore and could be configured in a hive-site.xml. Placed it in $SPARK_HOME/conf directory, or treating them as Spark properties with a spark.hadoop. prefix.
User Defaults
In Kyuubi, we can configure user default settings to meet separate needs. These user defaults override system defaults, but will be overridden by those from JDBC Connection URL or Set Command if could be. They will take effect when creating the SQL engine application ONLY.
User default settings are in the form of ___{username}___.{config key}. There are three continuous underscores(_) at both sides of the username and a dot(.) that separates the config key and the prefix. For example:
# For system defaults
spark.master=local
spark.sql.adaptive.enabled=true
# For a user named kent
___kent___.spark.master=yarn
___kent___.spark.sql.adaptive.enabled=false
# For a user named bob
___bob___.spark.master=spark://master:7077
___bob___.spark.executor.memory=8g
In the above case, if there are related configurations from JDBC Connection URL, kent will run his SQL engine application on YARN and prefer the Spark AQE to be off, while bob will activate his SQL engine application on a Spark standalone cluster with 8g heap memory for each executor and obey the Spark AQE behavior of Kyuubi system default. On the other hand, for those users who do not have custom configurations will use system defaults.
