[KYUUBI #5756] Introduce specified initialized SQL to every engine

# 🔍 Description
## Issue References 🔗

This pull request fixes #5756

## Describe Your Solution 🔧

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [x] No license issues.
- [x] Milestone correctly set?
- [x] Test coverage is ok
- [x] Assignees are selected.
- [x] Minimum number of approvals
- [x] No changes are requested

**Be nice. Be informative.**

Closes #5821 from hadoopkandy/KYUUBI-5756.

Closes #5756

046fe2a58 [kandy01.wang] [KYUUBI #5756] Introduce specified initialized SQL to every engine

Authored-by: kandy01.wang <kandy01.wang@vipshop.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
This commit is contained in:
kandy01.wang 2023-12-07 11:10:57 +08:00 committed by Cheng Pan
parent 5b6a729fa8
commit f3f643a309
No known key found for this signature in database
GPG Key ID: 8001952629BCC75D
10 changed files with 46 additions and 13 deletions

View File

@ -140,6 +140,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.engine.event.loggers | SPARK | A comma-separated list of engine history loggers, where engine/session/operation etc events go.<ul> <li>SPARK: the events will be written to the Spark listener bus.</li> <li>JSON: the events will be written to the location of kyuubi.engine.event.json.log.path</li> <li>JDBC: to be done</li> <li>CUSTOM: User-defined event handlers.</li></ul> Note that: Kyuubi supports custom event handlers with the Java SPI. To register a custom event handler, the user needs to implement a subclass of `org.apache.kyuubi.events.handler.CustomEventHandlerProvider` which has a zero-arg constructor. | seq | 1.3.0 |
| kyuubi.engine.flink.application.jars | &lt;undefined&gt; | A comma-separated list of the local jars to be shipped with the job to the cluster. For example, SQL UDF jars. Only effective in yarn application mode. | string | 1.8.0 |
| kyuubi.engine.flink.extra.classpath | &lt;undefined&gt; | The extra classpath for the Flink SQL engine, for configuring the location of hadoop client jars, etc. Only effective in yarn session mode. | string | 1.6.0 |
| kyuubi.engine.flink.initialize.sql | SHOW DATABASES | The initialize sql for Flink engine. It fallback to `kyuubi.engine.initialize.sql`. | seq | 1.8.1 |
| kyuubi.engine.flink.java.options | &lt;undefined&gt; | The extra Java options for the Flink SQL engine. Only effective in yarn session mode. | string | 1.6.0 |
| kyuubi.engine.flink.memory | 1g | The heap memory for the Flink SQL engine. Only effective in yarn session mode. | string | 1.6.0 |
| kyuubi.engine.hive.event.loggers | JSON | A comma-separated list of engine history loggers, where engine/session/operation etc events go.<ul> <li>JSON: the events will be written to the location of kyuubi.engine.event.json.log.path</li> <li>JDBC: to be done</li> <li>CUSTOM: to be done.</li></ul> | seq | 1.7.0 |
@ -174,6 +175,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.engine.share.level.subdomain | &lt;undefined&gt; | Allow end-users to create a subdomain for the share level of an engine. A subdomain is a case-insensitive string values that must be a valid zookeeper subpath. For example, for the `USER` share level, an end-user can share a certain engine within a subdomain, not for all of its clients. End-users are free to create multiple engines in the `USER` share level. When disable engine pool, use 'default' if absent. | string | 1.4.0 |
| kyuubi.engine.single.spark.session | false | When set to true, this engine is running in a single session mode. All the JDBC/ODBC connections share the temporary views, function registries, SQL configuration and the current database. | boolean | 1.3.0 |
| kyuubi.engine.spark.event.loggers | SPARK | A comma-separated list of engine loggers, where engine/session/operation etc events go.<ul> <li>SPARK: the events will be written to the Spark listener bus.</li> <li>JSON: the events will be written to the location of kyuubi.engine.event.json.log.path</li> <li>JDBC: to be done</li> <li>CUSTOM: to be done.</li></ul> | seq | 1.7.0 |
| kyuubi.engine.spark.initialize.sql | SHOW DATABASES | The initialize sql for Spark engine. It fallback to `kyuubi.engine.initialize.sql`. | seq | 1.8.1 |
| kyuubi.engine.spark.python.env.archive | &lt;undefined&gt; | Portable Python env archive used for Spark engine Python language mode. | string | 1.7.0 |
| kyuubi.engine.spark.python.env.archive.exec.path | bin/python | The Python exec path under the Python env archive. | string | 1.7.0 |
| kyuubi.engine.spark.python.home.archive | &lt;undefined&gt; | Spark archive containing $SPARK_HOME/python directory, which is used to init session Python worker for Python language mode. | string | 1.7.0 |
@ -427,6 +429,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.session.engine.alive.timeout | PT2M | The timeout for engine alive. If there is no alive probe success in the last timeout window, the engine will be marked as no-alive. | duration | 1.6.0 |
| kyuubi.session.engine.check.interval | PT1M | The check interval for engine timeout | duration | 1.0.0 |
| kyuubi.session.engine.flink.fetch.timeout | &lt;undefined&gt; | Result fetch timeout for Flink engine. If the timeout is reached, the result fetch would be stopped and the current fetched would be returned. If no data are fetched, a TimeoutException would be thrown. | duration | 1.8.0 |
| kyuubi.session.engine.flink.initialize.sql || The initialize sql for Flink session. It fallback to `kyuubi.engine.session.initialize.sql` | seq | 1.8.1 |
| kyuubi.session.engine.flink.main.resource | &lt;undefined&gt; | The package used to create Flink SQL engine remote job. If it is undefined, Kyuubi will use the default | string | 1.4.0 |
| kyuubi.session.engine.flink.max.rows | 1000000 | Max rows of Flink query results. For batch queries, rows exceeding the limit would be ignored. For streaming queries, the query would be canceled if the limit is reached. | int | 1.5.0 |
| kyuubi.session.engine.hive.main.resource | &lt;undefined&gt; | The package used to create Hive engine remote job. If it is undefined, Kyuubi will use the default | string | 1.6.0 |
@ -438,6 +441,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
| kyuubi.session.engine.open.max.attempts | 9 | The number of times an open engine will retry when encountering a special error. | int | 1.7.0 |
| kyuubi.session.engine.open.retry.wait | PT10S | How long to wait before retrying to open the engine after failure. | duration | 1.7.0 |
| kyuubi.session.engine.share.level | USER | (deprecated) - Using kyuubi.engine.share.level instead | string | 1.0.0 |
| kyuubi.session.engine.spark.initialize.sql || The initialize sql for Spark session. It fallback to `kyuubi.engine.session.initialize.sql` | seq | 1.8.1 |
| kyuubi.session.engine.spark.main.resource | &lt;undefined&gt; | The package used to create Spark SQL engine remote application. If it is undefined, Kyuubi will use the default | string | 1.0.0 |
| kyuubi.session.engine.spark.max.initial.wait | PT1M | Max wait time for the initial connection to Spark engine. The engine will self-terminate no new incoming connection is established within this time. This setting only applies at the CONNECTION share level. 0 or negative means not to self-terminate. | duration | 1.8.0 |
| kyuubi.session.engine.spark.max.lifetime | PT0S | Max lifetime for Spark engine, the engine will self-terminate when it reaches the end of life. 0 or negative means not to self-terminate. | duration | 1.6.0 |

View File

@ -32,7 +32,7 @@ import org.apache.flink.table.gateway.service.context.DefaultContext
import org.apache.kyuubi.{Logging, Utils}
import org.apache.kyuubi.Utils.{addShutdownHook, currentUser, FLINK_ENGINE_SHUTDOWN_PRIORITY}
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf.ENGINE_INITIALIZE_SQL
import org.apache.kyuubi.config.KyuubiConf.ENGINE_FLINK_INITIALIZE_SQL
import org.apache.kyuubi.config.KyuubiReservedKeys.{KYUUBI_ENGINE_NAME, KYUUBI_SESSION_USER_KEY}
import org.apache.kyuubi.engine.flink.FlinkSQLEngine.{countDownLatch, currentEngine}
import org.apache.kyuubi.service.Serverable
@ -139,7 +139,7 @@ object FlinkSQLEngine extends Logging {
tableEnv.executeSql("select 'kyuubi'").await()
}
kyuubiConf.get(ENGINE_INITIALIZE_SQL).foreach { stmt =>
kyuubiConf.get(ENGINE_FLINK_INITIALIZE_SQL).foreach { stmt =>
tableEnv.executeSql(stmt).await()
}

View File

@ -65,12 +65,12 @@ class FlinkSessionImpl(
override def open(): Unit = {
val executor = fSession.createExecutor(Configuration.fromMap(fSession.getSessionConfig))
sessionManager.getConf.get(ENGINE_SESSION_INITIALIZE_SQL).foreach { sql =>
sessionManager.getConf.get(ENGINE_SESSION_FLINK_INITIALIZE_SQL).foreach { sql =>
try {
executor.executeStatement(OperationHandle.create, sql)
} catch {
case NonFatal(e) =>
throw KyuubiSQLException(s"execute ${ENGINE_SESSION_INITIALIZE_SQL.key} $sql ", e)
throw KyuubiSQLException(s"execute ${ENGINE_SESSION_FLINK_INITIALIZE_SQL.key} $sql ", e)
}
}

View File

@ -53,8 +53,8 @@ class FlinkEngineInitializeSuite extends HiveJDBCTestHelper
ENGINE_TYPE.key -> "FLINK_SQL",
ENGINE_SHARE_LEVEL.key -> shareLevel,
OPERATION_PLAN_ONLY_MODE.key -> NoneMode.name,
ENGINE_INITIALIZE_SQL.key -> ENGINE_INITIALIZE_SQL_VALUE,
ENGINE_SESSION_INITIALIZE_SQL.key -> ENGINE_SESSION_INITIALIZE_SQL_VALUE,
ENGINE_FLINK_INITIALIZE_SQL.key -> ENGINE_INITIALIZE_SQL_VALUE,
ENGINE_SESSION_FLINK_INITIALIZE_SQL.key -> ENGINE_SESSION_INITIALIZE_SQL_VALUE,
KYUUBI_SESSION_USER_KEY -> "kandy")
}

View File

@ -290,7 +290,8 @@ object SparkSQLEngine extends Logging {
KyuubiSparkUtil.initializeSparkSession(
session,
kyuubiConf.get(ENGINE_INITIALIZE_SQL) ++ kyuubiConf.get(ENGINE_SESSION_INITIALIZE_SQL))
kyuubiConf.get(ENGINE_SPARK_INITIALIZE_SQL) ++ kyuubiConf.get(
ENGINE_SESSION_SPARK_INITIALIZE_SQL))
session.sparkContext.setLocalProperty(KYUUBI_ENGINE_URL, KyuubiSparkUtil.engineUrl)
session
}

View File

@ -130,7 +130,9 @@ class SparkSQLSessionManager private (name: String, spark: SparkSession)
private def newSparkSession(rootSparkSession: SparkSession): SparkSession = {
val newSparkSession = rootSparkSession.newSession()
KyuubiSparkUtil.initializeSparkSession(newSparkSession, conf.get(ENGINE_SESSION_INITIALIZE_SQL))
KyuubiSparkUtil.initializeSparkSession(
newSparkSession,
conf.get(ENGINE_SESSION_SPARK_INITIALIZE_SQL))
newSparkSession
}

View File

@ -104,7 +104,7 @@ class SparkEngineSuites extends KyuubiFunSuite {
withSystemProperty(Map(
s"spark.$KYUUBI_ENGINE_SUBMIT_TIME_KEY" -> String.valueOf(submitTime),
s"spark.${ENGINE_INIT_TIMEOUT.key}" -> String.valueOf(timeout),
s"spark.${ENGINE_INITIALIZE_SQL.key}" ->
s"spark.${ENGINE_SPARK_INITIALIZE_SQL.key}" ->
"select 1 where java_method('java.lang.Thread', 'sleep', 60000L) is null")) {
SparkSQLEngine.setupConf()
SparkSQLEngine.currentEngine = None

View File

@ -28,7 +28,7 @@ class SingleSessionSuite extends WithSparkSQLEngine with HiveJDBCTestHelper {
ENGINE_SHARE_LEVEL.key -> "SERVER",
ENGINE_SINGLE_SPARK_SESSION.key -> "true",
(
ENGINE_SESSION_INITIALIZE_SQL.key,
ENGINE_SESSION_SPARK_INITIALIZE_SQL.key,
"CREATE DATABASE IF NOT EXISTS INIT_DB_SOLO;" +
"CREATE TABLE IF NOT EXISTS INIT_DB_SOLO.test(a int) USING CSV;" +
"INSERT INTO INIT_DB_SOLO.test VALUES (2);"))

View File

@ -2140,6 +2140,13 @@ object KyuubiConf {
.toSequence(";")
.createWithDefault(Nil)
val ENGINE_SESSION_FLINK_INITIALIZE_SQL: ConfigEntry[Seq[String]] =
buildConf("kyuubi.session.engine.flink.initialize.sql")
.doc("The initialize sql for Flink session. " +
"It fallback to `kyuubi.engine.session.initialize.sql`")
.version("1.8.1")
.fallbackConf(ENGINE_SESSION_INITIALIZE_SQL)
val ENGINE_DEREGISTER_EXCEPTION_CLASSES: ConfigEntry[Set[String]] =
buildConf("kyuubi.engine.deregister.exception.classes")
.doc("A comma-separated list of exception classes. If there is any exception thrown," +
@ -2583,6 +2590,13 @@ object KyuubiConf {
.stringConf
.createWithDefault("yyyy-MM-dd HH:mm:ss.SSS")
val ENGINE_SESSION_SPARK_INITIALIZE_SQL: ConfigEntry[Seq[String]] =
buildConf("kyuubi.session.engine.spark.initialize.sql")
.doc("The initialize sql for Spark session. " +
"It fallback to `kyuubi.engine.session.initialize.sql`")
.version("1.8.1")
.fallbackConf(ENGINE_SESSION_INITIALIZE_SQL)
val ENGINE_TRINO_MEMORY: ConfigEntry[String] =
buildConf("kyuubi.engine.trino.memory")
.doc("The heap memory for the Trino query engine")
@ -2657,6 +2671,12 @@ object KyuubiConf {
.stringConf
.createOptional
val ENGINE_FLINK_INITIALIZE_SQL: ConfigEntry[Seq[String]] =
buildConf("kyuubi.engine.flink.initialize.sql")
.doc("The initialize sql for Flink engine. It fallback to `kyuubi.engine.initialize.sql`.")
.version("1.8.1")
.fallbackConf(ENGINE_INITIALIZE_SQL)
val SERVER_LIMIT_CONNECTIONS_PER_USER: OptionalConfigEntry[Int] =
buildConf("kyuubi.server.limit.connections.per.user")
.doc("Maximum kyuubi server connections per user." +
@ -3154,6 +3174,12 @@ object KyuubiConf {
.toSequence()
.createWithDefault(Seq("spark.driver.memory", "spark.executor.memory"))
val ENGINE_SPARK_INITIALIZE_SQL: ConfigEntry[Seq[String]] =
buildConf("kyuubi.engine.spark.initialize.sql")
.doc("The initialize sql for Spark engine. It fallback to `kyuubi.engine.initialize.sql`.")
.version("1.8.1")
.fallbackConf(ENGINE_INITIALIZE_SQL)
val ENGINE_HIVE_EVENT_LOGGERS: ConfigEntry[Seq[String]] =
buildConf("kyuubi.engine.hive.event.loggers")
.doc("A comma-separated list of engine history loggers, where engine/session/operation etc" +

View File

@ -19,19 +19,19 @@ package org.apache.kyuubi.engine.spark
import org.apache.kyuubi.WithKyuubiServer
import org.apache.kyuubi.config.KyuubiConf
import org.apache.kyuubi.config.KyuubiConf.{ENGINE_INITIALIZE_SQL, ENGINE_SESSION_INITIALIZE_SQL}
import org.apache.kyuubi.config.KyuubiConf.{ENGINE_SESSION_SPARK_INITIALIZE_SQL, ENGINE_SPARK_INITIALIZE_SQL}
import org.apache.kyuubi.operation.HiveJDBCTestHelper
class InitializeSQLSuite extends WithKyuubiServer with HiveJDBCTestHelper {
override protected val conf: KyuubiConf = {
KyuubiConf()
.set(
ENGINE_INITIALIZE_SQL.key,
ENGINE_SPARK_INITIALIZE_SQL.key,
"CREATE DATABASE IF NOT EXISTS INIT_DB;" +
"CREATE TABLE IF NOT EXISTS INIT_DB.test(a int) USING CSV;" +
"INSERT OVERWRITE TABLE INIT_DB.test VALUES (1);")
.set(
ENGINE_SESSION_INITIALIZE_SQL.key,
ENGINE_SESSION_SPARK_INITIALIZE_SQL.key,
"CREATE DATABASE IF NOT EXISTS INIT_DB;" +
"CREATE TABLE IF NOT EXISTS INIT_DB.test(a int) USING CSV;" +
"INSERT INTO INIT_DB.test VALUES (2);")