Commit Graph

1189 Commits

Author SHA1 Message Date
dependabot[bot]
87f3f90120
[KYUUBI #7157] Bump form-data from 4.0.0 to 4.0.4 in /kyuubi-server/web-ui
Bumps [form-data](https://github.com/form-data/form-data) from 4.0.0 to 4.0.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/form-data/form-data/releases">form-data's releases</a>.</em></p>
<blockquote>
<h2>v4.0.1</h2>
<h3>Fixes</h3>
<ul>
<li>npmignore temporary build files (<a href="https://redirect.github.com/form-data/form-data/issues/532">#532</a>)</li>
<li>move util.isArray to Array.isArray (<a href="https://redirect.github.com/form-data/form-data/issues/564">#564</a>)</li>
</ul>
<h3>Tests</h3>
<ul>
<li>migrate from travis to GHA</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/form-data/form-data/blob/master/CHANGELOG.md">form-data's changelog</a>.</em></p>
<blockquote>
<h2><a href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a> - 2025-07-16</h2>
<h3>Commits</h3>
<ul>
<li>[meta] add <code>auto-changelog</code> <a href="811f68282f"><code>811f682</code></a></li>
<li>[Tests] handle predict-v8-randomness failures in node &lt; 17 and node &gt; 23 <a href="1d11a76434"><code>1d11a76</code></a></li>
<li>[Fix] Switch to using <code>crypto</code> random for boundary values <a href="3d1723080e"><code>3d17230</code></a></li>
<li>[Tests] fix linting errors <a href="5e340800b5"><code>5e34080</code></a></li>
<li>[meta] actually ensure the readme backup isn’t published <a href="316c82ba93"><code>316c82b</code></a></li>
<li>[Dev Deps] update <code>ljharb/eslint-config</code> <a href="58c25d7640"><code>58c25d7</code></a></li>
<li>[meta] fix readme capitalization <a href="2300ca1959"><code>2300ca1</code></a></li>
</ul>
<h2><a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a> - 2025-06-05</h2>
<h3>Fixed</h3>
<ul>
<li>[Fix] <code>append</code>: avoid a crash on nullish values <a href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li>
</ul>
<h3>Commits</h3>
<ul>
<li>[eslint] use a shared config <a href="426ba9ac44"><code>426ba9a</code></a></li>
<li>[eslint] fix some spacing issues <a href="20941917f0"><code>2094191</code></a></li>
<li>[Refactor] use <code>hasown</code> <a href="81ab41b46f"><code>81ab41b</code></a></li>
<li>[Fix] validate boundary type in <code>setBoundary()</code> method <a href="8d8e469309"><code>8d8e469</code></a></li>
<li>[Tests] add tests to check the behavior of <code>getBoundary</code> with non-strings <a href="837b8a1f75"><code>837b8a1</code></a></li>
<li>[Dev Deps] remove unused deps <a href="870e4e6659"><code>870e4e6</code></a></li>
<li>[meta] remove local commit hooks <a href="e6e83ccb54"><code>e6e83cc</code></a></li>
<li>[Dev Deps] update <code>eslint</code> <a href="4066fd6f65"><code>4066fd6</code></a></li>
<li>[meta] fix scripts to use prepublishOnly <a href="c4bbb13c0e"><code>c4bbb13</code></a></li>
</ul>
<h2><a href="https://github.com/form-data/form-data/compare/v4.0.1...v4.0.2">v4.0.2</a> - 2025-02-14</h2>
<h3>Merged</h3>
<ul>
<li>[Fix] set <code>Symbol.toStringTag</code> when available <a href="https://redirect.github.com/form-data/form-data/pull/573"><code>[#573](https://github.com/form-data/form-data/issues/573)</code></a></li>
<li>[Fix] set <code>Symbol.toStringTag</code> when available <a href="https://redirect.github.com/form-data/form-data/pull/573"><code>[#573](https://github.com/form-data/form-data/issues/573)</code></a></li>
<li>fix (npmignore): ignore temporary build files <a href="https://redirect.github.com/form-data/form-data/pull/532"><code>[#532](https://github.com/form-data/form-data/issues/532)</code></a></li>
<li>fix (npmignore): ignore temporary build files <a href="https://redirect.github.com/form-data/form-data/pull/532"><code>[#532](https://github.com/form-data/form-data/issues/532)</code></a></li>
</ul>
<h3>Fixed</h3>
<ul>
<li>[Fix] set <code>Symbol.toStringTag</code> when available (<a href="https://redirect.github.com/form-data/form-data/issues/573">#573</a>) <a href="https://redirect.github.com/form-data/form-data/issues/396"><code>[#396](https://github.com/form-data/form-data/issues/396)</code></a></li>
<li>[Fix] set <code>Symbol.toStringTag</code> when available (<a href="https://redirect.github.com/form-data/form-data/issues/573">#573</a>) <a href="https://redirect.github.com/form-data/form-data/issues/396"><code>[#396](https://github.com/form-data/form-data/issues/396)</code></a></li>
<li>[Fix] set <code>Symbol.toStringTag</code> when available <a href="https://redirect.github.com/form-data/form-data/issues/396"><code>[#396](https://github.com/form-data/form-data/issues/396)</code></a></li>
</ul>
<h3>Commits</h3>
<ul>
<li>Merge tags v2.5.3 and v3.0.3 <a href="92613b9208"><code>92613b9</code></a></li>
<li>[Tests] migrate from travis to GHA <a href="806eda7774"><code>806eda7</code></a></li>
<li>[Tests] migrate from travis to GHA <a href="8fdb3bc6b5"><code>8fdb3bc</code></a></li>
</ul>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="41996f5ac7"><code>41996f5</code></a> v4.0.4</li>
<li><a href="316c82ba93"><code>316c82b</code></a> [meta] actually ensure the readme backup isn’t published</li>
<li><a href="2300ca1959"><code>2300ca1</code></a> [meta] fix readme capitalization</li>
<li><a href="811f68282f"><code>811f682</code></a> [meta] add <code>auto-changelog</code></li>
<li><a href="5e340800b5"><code>5e34080</code></a> [Tests] fix linting errors</li>
<li><a href="1d11a76434"><code>1d11a76</code></a> [Tests] handle predict-v8-randomness failures in node &lt; 17 and node &gt; 23</li>
<li><a href="58c25d7640"><code>58c25d7</code></a> [Dev Deps] update <code>ljharb/eslint-config</code></li>
<li><a href="3d1723080e"><code>3d17230</code></a> [Fix] Switch to using <code>crypto</code> random for boundary values</li>
<li><a href="d8d67dc8ac"><code>d8d67dc</code></a> v4.0.3</li>
<li><a href="e6e83ccb54"><code>e6e83cc</code></a> [meta] remove local commit hooks</li>
<li>Additional commits viewable in <a href="https://github.com/form-data/form-data/compare/v4.0.0...v4.0.4">compare view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a href="https://www.npmjs.com/~ljharb">ljharb</a>, a new releaser for form-data since your current version.</p>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=form-data&package-manager=npm_and_yarn&previous-version=4.0.0&new-version=4.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/kyuubi/network/alerts).

</details>

Closes #7157 from dependabot[bot]/dependabot/npm_and_yarn/kyuubi-server/web-ui/form-data-4.0.4.

Closes #7157

4d754d973 [dependabot[bot]] Bump form-data from 4.0.0 to 4.0.4 in /kyuubi-server/web-ui

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-08-07 15:04:00 +08:00
Cheng Pan
97f0bae87d
[KYUUBI #7148] Fix spark.kubernetes.file.upload.path permission
### Why are the changes needed?

The default behavior of HDFS is to set the permission of a file created with `FileSystem.create` or `FileSystem.mkdirs` to `(P & ^umask)`, where `P` is the permission in the API call and umask is a system value set by `fs.permissions.umask-mode` and defaults to `0022`. This means, with default settings, any mkdirs call can have at most `755` permissions.

The same issue also got reported in [SPARK-30860](https://issues.apache.org/jira/browse/SPARK-30860)

### How was this patch tested?

Manual test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7148 from pan3793/fs-mkdirs.

Closes #7148

7527060ac [Cheng Pan] fix
f64913277 [Cheng Pan] Fix spark.kubernetes.file.upload.path permission

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-22 16:41:32 +08:00
Wang, Fei
a3f1e51e78
[KYUUBI #7141] Support to get spark app url with pattern http://{{SPARK_DRIVER_POD_IP}}:{{SPARK_UI_PORT}}
### Why are the changes needed?

We are using [virtual-kubelet](https://github.com/virtual-kubelet/virtual-kubelet) for spark on kubernetes, and spark kubernetes pods would be allocated across kubernetes clusters.

And we use the driver POD ip as driver host, see https://github.com/apache/spark/pull/40392, which is supported since spark-3.5.

The kubernetes context and namespace are virtual and we can not build the app URL by spark driver svc.

And the spark driver pod IP is accessible for our use case, so raise this PR to build the spark app url by spark driver pod id and spark ui port.

### How was this patch tested?

UT.

<img width="1532" height="626" alt="image" src="https://github.com/user-attachments/assets/5cb54602-9e79-40b7-b51c-0b873c17560b" />
<img width="710" height="170" alt="image" src="https://github.com/user-attachments/assets/6d1c9580-62d6-423a-a04f-dc6cdcee940a" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7141 from turboFei/app_url_v2.

Closes #7141

1277952f5 [Wang, Fei] VAR
d15e6bea7 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
1535e00ac [Wang, Fei] spark driver pod ip

Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-21 10:11:07 +08:00
Wang, Fei
a1a08e7f93
[KYUUBI #7132] Respect kyuubi.session.engine.startup.waitCompletion for wait engine completion
### Why are the changes needed?

We should not fail the batch submission if the submit process is alive and wait engine completion is false.

Especially for spark on kubernetes, the app might failed with NOT_FOUND state if the spark submit process running time more than the submit timeout.

In this PR, if the `kyuubi.session.engine.startup.waitCompletion` is false, when getting the application info, it use current timestamp as submit time to prevent the app failed with NOT_FOUND state due to submit timeout.

### How was this patch tested?

Pass current GA and manually testing.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7132 from turboFei/batch_submit.

Closes #7132

efb06db1c [Wang, Fei] refine
7e453c162 [Wang, Fei] refine
7bca1a7aa [Wang, Fei] Prevent potential timeout durartion polling the application info
15529ab85 [Wang, Fei] prevent metadata manager fail
38335f2f9 [Wang, Fei] refine
9b8a9fde4 [Wang, Fei] comments
11f607daa [Wang, Fei] docs
f2f6ba148 [Wang, Fei] revert
2da0705ad [Wang, Fei] wait for if not wait complete
d84963420 [Wang, Fei] revert check in loop
b4cf50a49 [Wang, Fei] comments
8c262b7ec [Wang, Fei] refine
ecf379b86 [Wang, Fei] Revert conf change
60dc1676e [Wang, Fei] enlarge
4d0aa542a [Wang, Fei] Save
4aea96552 [Wang, Fei] refine
2ad75fcbf [Wang, Fei] nit
a71b11df6 [Wang, Fei] Do not fail batch if the process is alive

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-14 01:49:06 +08:00
wangzhigang
84928184fc
[KYUUBI #7121] Improve operation timeout management with configurable executors
### Why are the changes needed?

The current mechanism for handling operation timeouts in Kyuubi creates a new `ScheduledExecutorService` with a dedicated thread for each operation. In scenarios with a large number of concurrent operations, this results in excessive thread creation, which consumes substantial system resources and may adversely affect server performance and stability.

This PR introduces a shared `ScheduledThreadPool` within the Operation Manager to centrally schedule operation timeouts. This approach avoids the overhead of creating an excessive number of threads, thereby reducing the system load. Additionally, both the pool size and thread keep-alive time are configurable via the `OPERATION_TIMEOUT_POOL_SIZE` and `OPERATION_TIMEOUT_POOL_KEEPALIVE_TIME` parameters.

### How was this patch tested?

A new unit test for `newDaemonScheduledThreadPool` was added to `ThreadUtilsSuite.scala`. Furthermore, a dedicated `TimeoutSchedulerSuite` was introduced to verify operation timeout behavior.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7121 from wangzhigang1999/master.

Closes #7121

df7688dbf [wangzhigang] Refactor timeout management configuration and improve documentation
2b03b1e68 [wangzhigang] Remove deprecated `ThreadPoolTimeoutExecutor` class following refactor of operation timeout management.
52a8a516a [wangzhigang] Refactor operation timeout management to use per-OperationManager scheduler
7e46d47f8 [wangzhigang] Refactor timeout management by introducing ThreadPoolTimeoutExecutor
f7f10881a [wangzhigang] Add operation timeout management with ThreadPoolTimeoutExecutor
d8cd6c7d4 [wangzhigang] Update .gitignore to exclude .bloop and .metals directories

Lead-authored-by: wangzhigang <wangzhigang1999@live.cn>
Co-authored-by: wangzhigang <wzg443064@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-09 10:51:30 +08:00
Cheng Pan
efe9f5552c
[KYUUBI #6876] Fix hadoopConf for autoCreateFileUploadPath
### Why are the changes needed?

This change fixes two issues:
1. `KyuubiHadoopUtils.newHadoopConf` should `loadDefaults`, otherwise `core-site.xml`, `hdfs-site.xml` won't take effect.
2. To make it aware of Hadoop conf hot reload, we should use `KyuubiServer.getHadoopConf()`.

### How was this patch tested?

Manual test that `core-site.xml` takes affect, previously not.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7102 from pan3793/6876-followup.

Closes #6876

24989d688 [Cheng Pan] [KYUUBI #6876] Fix hadoopConf for autoCreateFileUploadPath

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-09 10:14:15 +08:00
Cheng Pan
e98ad7bf32
[KYUUBI #7112] Enhance test 'capture error from spark process builder' for Spark 4.0
### Why are the changes needed?

```
- capture error from spark process builder *** FAILED ***
  The code passed to eventually never returned normally. Attempted 167 times over 1.50233072485 minutes. Last failure message: "org.apache.kyuubi.KyuubiSQLException: 	Suppressed: org.apache.spark.util.Utils$OriginalTryStackTraceException: Full stacktrace of original doTryWithCallerStacktrace caller
   See more: /builds/lakehouse/kyuubi/kyuubi-server/target/work/kentyao/kyuubi-spark-sql-engine.log.2
  	at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
  	at org.apache.kyuubi.engine.ProcBuilder.$anonfun$start$1(ProcBuilder.scala:234)
  	at java.base/java.lang.Thread.run(Thread.java:840)
  .
  FYI: The last 4096 line(s) of log are:
...
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
  	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742)
  	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
  	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
  	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
  	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3607)
  	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659)
  	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3639)
  	at org.apache.spark.sql.hive.client.HiveClientImpl$.$anonfun$getHive$5(HiveClientImpl.scala:1458)
...
  25/06/19 18:20:08 INFO SparkContext: Successfully stopped SparkContext
  25/06/19 18:20:08 INFO ShutdownHookManager: Shutdown hook called
  25/06/19 18:20:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-791ea5a0-44d2-4750-a549-a3ea2[3254](https://g.hz.netease.com/lakehouse/kyuubi/-/jobs/7667660#L3254)6b2
  25/06/19 18:20:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-1ab9d4a0-707d-4619-bc83-232c29c891f9
  25/06/19 18:20:08 INFO ShutdownHookManager: Deleting directory /builds/lakehouse/kyuubi/kyuubi-server/target/work/kentyao/artifacts/spark-9ee628b1-0c29-4d32-8078-c023d1f812d7" did not contain "org.apache.hadoop.hive.ql.metadata.HiveException:". (SparkProcessBuilderSuite.scala:79)
```

### How was this patch tested?

Pass GHA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7112 from pan3793/ut-spark-4.0.

Closes #7112

bd4a24bea [Cheng Pan] Enhance test 'capture error from spark process builder' for Spark 4.0

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-02 18:19:28 +08:00
cutiechi
4717987e37
[KYUUBI #7113] Skip Hadoop classpath check if flink-shaded-hadoop jar exists in Flink lib directory
### Why are the changes needed?

This change addresses an issue where the Flink engine in Kyuubi would perform a Hadoop classpath check even when a ‎`flink-shaded-hadoop` jar is already present in the Flink ‎`lib` directory. In such cases, the check is unnecessary and may cause confusion or warnings in environments where the shaded jar is used instead of a full Hadoop classpath. By skipping the check when a ‎`flink-shaded-hadoop` jar exists, we improve compatibility and reduce unnecessary log output.

### How was this patch tested?

The patch was tested by deploying Kyuubi with a Flink environment that includes a ‎`flink-shaded-hadoop` jar in the ‎`lib` directory and verifying that the classpath check is correctly skipped. Additional tests ensured that the check still occurs when neither the Hadoop classpath nor the shaded jar is present. Unit tests and manual verification steps were performed to confirm the fix.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7113 from cutiechi/fix/flink-classpath-missing-hadoop-check.

Closes #7113

99a4bf834 [cutiechi] fix(flink): fix process builder suite
7b9998760 [cutiechi] fix(flink): remove hadoop cp add
ea33258a3 [cutiechi] fix(flink): update flink hadoop classpath doc
6bb3b1dfa [cutiechi] fix(flink): optimize hadoop class path messages
c548ed6a1 [cutiechi] fix(flink): simplify classpath detection by merging hasHadoopJar conditions
9c16d5436 [cutiechi] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/flink/FlinkProcessBuilder.scala
0f729dcf9 [cutiechi] fix(flink): skip hadoop classpath check if flink-shaded-hadoop jar exists

Authored-by: cutiechi <superchijinpeng@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-07-02 17:33:07 +08:00
Wang, Fei
e769f42398 [KYUUBI #6884] [FEATURE] Support to reassign the batches to alternative kyuubi instance in case kyuubi instance lost
### Why are the changes needed?

Support to reassign the batches to alternative kyuubi instance in case kyuubi instance lost.
https://github.com/apache/kyuubi/issues/6884

### How was this patch tested?

Unit Test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #7037 from George314159/6884.

Closes #6884

8565d4aaa [Wang, Fei] KYUUBI_SESSION_CONNECTION_URL_KEY
22d4539e2 [Wang, Fei] admin
075654cb3 [Wang, Fei] check admin
5654a99f4 [Wang, Fei] log and lock
a19e2edf5 [Wang, Fei] minor comments
a60f23ba3 [George314159] refine
760e10f89 [George314159] Update Based On Comments
75f1ee2a9 [Fei Wang] ping (#1)
f42bcaf9a [George314159] Update Based on Comments
1bea70ed6 [George314159] [KYUUBI-6884] Support to reassign the batches to alternative kyuubi instance in case kyuubi instance lost

Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: George314159 <hua16732@gmail.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-06-22 22:36:51 -07:00
Wang, Fei
302b5fa1e6 [KYUUBI #7101] Load the existing pods when initializing kubernetes client to cleanup terminated app pods
### Why are the changes needed?

To prevent the terminated app pods leak if the events missed during kyuubi server restart.

### How was this patch tested?

Manual test.

```
:2025-06-17 17:50:37.275 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423211008-grectg-stm-17da59fe-caf4-41e4-a12f-6c1ed9a293f9-driver with label: kyuubi-unique-tag=17da59fe-caf4-41e4-a12f-6c1ed9a293f9 in app state FINISHED, marking it as terminated
2025-06-17 17:50:37.278 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423212011-gpdtsi-stm-6a23000f-10be-4a42-ae62-4fa2da8fac07-driver with label: kyuubi-unique-tag=6a23000f-10be-4a42-ae62-4fa2da8fac07 in app state FINISHED, marking it as terminated
```
The pods are cleaned up eventually.
<img width="664" alt="image" src="https://github.com/user-attachments/assets/8cf58f61-065f-4fb0-9718-2e3c00e8d2e0" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7101 from turboFei/pod_cleanup.

Closes #7101

7f76cf57c [Wang, Fei] async
11c9db25d [Wang, Fei] cleanup

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-06-22 22:35:14 -07:00
Wang, Fei
bada9c0411 [KYUUBI #7095] Respect terminated app state when building batch info from metadata
### Why are the changes needed?

Respect terminated app state when building batch info from metadata

It is a followup for https://github.com/apache/kyuubi/pull/2911,
9e40e39c39/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala (L128-L142)

1. if the kyuubi instance is unreachable during maintain window.
2. the batch app state has been terminated, and the app stated was backfilled by another kyuubi instance peer, see #2911
3. the batch state in the metadata table is still PENDING/RUNNING
4. return the terminated batch state for such case instead of `PENDING or RUNNING`.
### How was this patch tested?

GA and IT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7095 from turboFei/always_respect_appstate.

Closes #7095

ec72666c9 [Wang, Fei] rename
bc74a9c56 [Wang, Fei] if op not terminated
e786c8d9b [Wang, Fei] respect terminated app state when building batch info from metadata

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-06-12 10:20:44 -07:00
Wang, Fei
9e40e39c39 [KYUUBI #7093] Log the metadata cleanup count
### Why are the changes needed?

To show how many metadata records cleaned up.
### How was this patch tested?

```
(base) ➜  kyuubi git:(delete_metadata) grep 'Cleaned up' target/unit-tests.log
01:58:17.109 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.124 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.144 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.161 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.180 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.199 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.216 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.236 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.253 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.270 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.290 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.310 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.327 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.348 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.368 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.384 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.400 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.419 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.437 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.456 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.475 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.493 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.513 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.533 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.551 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.569 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.590 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.611 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.631 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.651 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.668 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.688 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.705 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.725 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.744 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.764 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.784 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.801 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.822 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.849 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.870 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.889 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.910 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.929 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.948 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.970 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:17.994 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:18.014 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:18.032 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:18.050 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:18.069 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:18.086 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata.
01:58:18.108 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 1 records older than 1000 ms from metadata.
01:58:18.162 ScalaTest-run INFO JDBCMetadataStore: Cleaned up 0 records older than 0 ms from k8s_engine_info.
```
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7093 from turboFei/delete_metadata.

Closes #7093

e0cf300f8 [Wang, Fei] update

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-06-10 06:59:57 -07:00
Lennon Chin
cad5a392f3
[KYUUBI #7072] Expose metrics of engine startup permit state
### Why are the changes needed?

The metrics `kyuubi_operation_state_LaunchEngine_*` cannot reflect the state of Semaphore after configuring the maximum engine startup limit through `kyuubi.server.limit.engine.startup`, add some metrics to show the relevant permit state.

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

Closes #7072 from LennonChin/engine_startup_metrics.

Closes #7072

d6bf3696a [Lennon Chin] Expose metrics of engine startup permit status

Authored-by: Lennon Chin <i@coderap.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-05-29 13:27:42 +08:00
taylor.fan
127c736a8f
[KYUUBI #6926] Add SERVER_LOCAL engine share level
### Why are the changes needed?

As clarified in https://github.com/apache/kyuubi/issues/6926, there are some scenarios user want to launch engine on each kyuubi server. SERVER_LOCAL engine share level implement this function by extracting local host address as subdomain, in which case each kyuubi server's engine is unique.

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #7013 from taylor12805/share_level_server_local.

Closes #6926

ba201bb72 [taylor.fan] [KYUUBI #6926] update format
42f0a4f7d [taylor.fan] [KYUUBI #6926] move host address to subdomain
e06de79ad [taylor.fan] [KYUUBI #6926] Add SERVER_LOCAL engine share level

Authored-by: taylor.fan <taylor.fan@vipshop.com>
Signed-off-by: Kent Yao <yao@apache.org>
2025-04-29 10:42:50 +08:00
Wang, Fei
ecfca79328
[KYUUBI #7033] Treat YARN/Kubernetes application NOT_FOUND as failed to prevent data quality issue
### Why are the changes needed?

Currently, NOT_FOUND application stated is treated as a terminated but not failed state.

It might cause some data quality issue if downstream application depends on the batch state for data processing.

So, I think we should treat NOT_FOUND as a failed state instead.

Currently, we support 3 types of application manager.
1. [JpsApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala)
2. [YarnApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala)
3. [KubernetesApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala)

YarnApplicationOperation and KubernetesApplicationOperation are widely used in production use case.

And in multiple kyuubi instance mode, the NOT_FOUND case should rarely happen.
1.  7e199d6fdb/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala (L369-L385)

3. https://github.com/apache/kyuubi/pull/7029

So, I think we should treat NOT_FOUND as a failed state in production use case.
It is better to fail some corner cases than to mistakenly set unsuccessful batches to the finished state.

### How was this patch tested?

GA.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7033 from turboFei/revist_not_found.

Closes #7033

ada4f8822 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala
985e23c24 [Wang, Fei] Refine
f03d61242 [Wang, Fei] comments
b9d6ac203 [Wang, Fei] incase the metadata updated by peer instance
3bd61ca85 [Wang, Fei] add
339df4730 [Wang, Fei] treat NOT_FOUND as failed

Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-04-27 21:09:08 +08:00
Wang, Fei
02a6b13d77 [KYUUBI #7028] Persist the kubernetes application terminate state into metastore for app info store fallback
### Why are the changes needed?

1. Persist the kubernetes application terminate info into metastore to prevent the event lose.
2. If it can not get the application info from informer application info store, fallback to get the application info from metastore instead of return NOT_FOUND directly.
3. It is critical because if we return false application state, it might cause data quality issue.

### How was this patch tested?

UT and IT.

<img width="1917" alt="image" src="https://github.com/user-attachments/assets/306f417c-5037-4869-904d-dcf657ff8f60" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7029 from turboFei/kubernetes_state.

Closes #7028

9f2badef3 [Wang, Fei] generic dialect
186cc690d [Wang, Fei] nit
82ea62669 [Wang, Fei] Add pod name
4c59bebb5 [Wang, Fei] Refine
327a0d594 [Wang, Fei] Remove create_time from k8s engine info
12c24b1d0 [Wang, Fei] do not use MYSQL deprecated VALUES(col)
becf9d1a7 [Wang, Fei] insert or replace
d167623c1 [Wang, Fei] migration

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-27 01:37:27 -07:00
Wang, Fei
4cbff4d192 [KYUUBI #7045] Expose jetty metrics
### Why are the changes needed?

Expose the jetty metrics to help detect issue.
Refer: https://metrics.dropwizard.io/4.2.0/manual/jetty.html

### How was this patch tested?

<img width="1425" alt="image" src="https://github.com/user-attachments/assets/ac8c9a48-eaa1-48ee-afec-6f33980d4270" />

<img width="1283" alt="image" src="https://github.com/user-attachments/assets/c2fa444b-6337-4662-832b-3d02f206bd13" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7045 from turboFei/metrics_jetty.

Closes #7045

122b93f3d [Wang, Fei] metrics
45a73e7cd [Wang, Fei] metrics

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-25 00:02:56 -07:00
Wang, Fei
29b6076319 [KYUUBI #7043] Support to construct the batch info from metadata directly
### Why are the changes needed?

Add an option to allow construct the batch info from metadata directly instead of redirecting the requests to reduce the RPC latency.

### How was this patch tested?

Minor change and Existing GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7043 from turboFei/support_no_redirect.

Closes #7043

7f7a2fb80 [Wang, Fei] comments
bb0e324a1 [Wang, Fei] save

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-24 22:42:26 -07:00
Wang, Fei
75891d1a92 [KYUUBI #7034][FOLLOWUP] Decouple the kubernetes pod name and app name
### Why are the changes needed?

Followup for #7034  to fix the SparkOnKubernetesTestsSuite.

Sorry, I forget that the appInfo name and pod name were deeply bound before, the appInfo name was used as pod name and used to delete pod.

In this PR, we add `podName` into applicationInfo to separate app name and pod name.

### How was this patch tested?

GA should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7039 from turboFei/fix_test.

Closes #7034

0ff7018d6 [Wang, Fei] revert
18e48c079 [Wang, Fei] comments
19f34bc83 [Wang, Fei] do not get pod name from appName
c1d308437 [Wang, Fei] reduce interval for test stability
50fad6bc5 [Wang, Fei] fix ut

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-24 22:40:28 -07:00
Wang, Fei
ba854d3c99
[KYUUBI #7035] Close the operation by operation manager to prevent operation leak
### Why are the changes needed?

To fix the operation leak if the session init timeout.

1. the operationHandle has not been added into session `opHandleSet` (super.runOperation).
2. The `operation.close()` only close the operation, but it does not remove it from `handleToOperation` map.
3. the session close would not remove the opHandle from `handleToOperation` as it has not been added into session `opHandleSet`

So here we can resolve the operation leak by invoking `operationManager.closeOperation` to remove the operation handle and close session.
cc68cb4c85/kyuubi-server/src/main/scala/org/apache/kyuubi/session/KyuubiSessionImpl.scala (L235-L246)

cc68cb4c85/kyuubi-common/src/main/scala/org/apache/kyuubi/session/AbstractSession.scala (L100-L103)

cc68cb4c85/kyuubi-common/src/main/scala/org/apache/kyuubi/operation/OperationManager.scala (L127-L130)

cc68cb4c85/kyuubi-common/src/main/scala/org/apache/kyuubi/session/AbstractSession.scala (L89-L92)

FYI: the operation was added into `handleToOperation` during new operation.
cc68cb4c85/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperationManager.scala (L56-L64)

### How was this patch tested?

Minor change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7035 from turboFei/remove_op.

Closes #7035

3c376833a [Wang, Fei] close by op mgr

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-04-24 14:14:07 +08:00
Wang, Fei
ee677a6feb [KYUUBI #7041] Fix NPE when getting metadtamanager in KubernetesApplicationOperation
### Why are the changes needed?

To fix NPE.

Before, we use below method to get `metadataManager`.
```
private def metadataManager = KyuubiServer.kyuubiServer.backendService
    .sessionManager.asInstanceOf[KyuubiSessionManager].metadataManager
```
But before the kyuubi server fully restarted, the `KyuubiServer.kyuubiServer` is null and might throw NPE during batch recovery phase.

For example:

```
:2025-04-23 14:06:24.040 ERROR [KyuubiSessionManager-exec-pool: Thread-231] org.apache.kyuubi.engine.KubernetesApplicationOperation: Failed to get application by label: kyuubi-unique-tag=95116703-4240-4cc1-9886-ccae3a2ac879, due to Cannot invoke "org.apache.kyuubi.server.KyuubiServer.backendService()" because the return value of "org.apache.kyuubi.server.KyuubiServer$.kyuubiServer()" is null
```

### How was this patch tested?

Existing GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7041 from turboFei/fix_NPE.

Closes #7041

064d88707 [Wang, Fei] Fix NPE

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-23 20:20:39 -07:00
Wang, Fei
cc68cb4c85 [KYUUBI #7034] [KUBERNETES] Prefer to use pod spark-app-name label as application name than pod name
### Why are the changes needed?

After https://github.com/apache/spark/pull/34460 (Since Spark 3.3.0), the `spark-app-name` is available.

We shall use it as the application name if it exists.

### How was this patch tested?

Minor change.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7034 from turboFei/k8s_app_name.

Closes #7034

bfa88a436 [Wang, Fei] Get pod app name

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-16 19:28:04 -07:00
Wang, Fei
7e199d6fdb [KYUUBI #7025] [KYUUBI #6686][FOLLOWUP] Prefer terminated container app state than terminated pod state
### Why are the changes needed?

I found that, for a kyuubi batch on kubernetes.

1. It has been `FINISHED`.
2. then I delete the pod manually, then I check the k8s-audit.log, then the appState became `FAILED`.

```
2025-04-15 11:16:30.453 INFO [-675216314-pool-44-thread-839] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: label=61e7d8c1-e5a9-46cd-83e7-c611003f0224     context=97      namespace=dls-prod      pod=kyuubi-spark-61e7d8c1-e5a9-46cd-83e7-c611003f0224-driver podState=Running        containers=[microvault->ContainerState(running=ContainerStateRunning(startedAt=2025-04-15T18:13:48Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://72704f8e7ccb5e877c8f6b10bf6ad810d0c019e07e0cb5975be733e79762c1ec, exitCode=0, finishedAt=2025-04-15T18:14:22Z, message=null, reason=Completed, signal=null, startedAt=2025-04-15T18:13:49Z, additionalProperties={}), waiting=null, additionalProperties={})]   appId=spark-228c62e0dc37402bacac189d01b871e4    appState=FINISHED       appError=''
:2025-04-15 11:16:30.854 INFO [-675216314-pool-44-thread-840] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: label=61e7d8c1-e5a9-46cd-83e7-c611003f0224     context=97      namespace=dls-prod      pod=kyuubi-spark-61e7d8c1-e5a9-46cd-83e7-c611003f0224-driver podState=Failed containers=[microvault->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://91654e3ee74e2c31218e14be201b50a4a604c2ad15d3afd84dc6f620e59894b7, exitCode=2, finishedAt=2025-04-15T18:16:30Z, message=null, reason=Error, signal=null, startedAt=2025-04-15T18:13:48Z, additionalProperties={}), waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://72704f8e7ccb5e877c8f6b10bf6ad810d0c019e07e0cb5975be733e79762c1ec, exitCode=0, finishedAt=2025-04-15T18:14:22Z, message=null, reason=Completed, signal=null, startedAt=2025-04-15T18:13:49Z, additionalProperties={}), waiting=null, additionalProperties={})]    appId=spark-228c62e0dc37402bacac189d01b871e4    appState=FAILED appError='{
```

This PR is a followup for #6690 , which ignore the container state if POD is terminated.

It is more reasonable to respect the terminated container state than terminated pod state.

### How was this patch tested?

Integration testing.

```
:2025-04-15 13:53:24.551 INFO [-1077768163-pool-36-thread-3] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE	label=e0eb4580-3cfa-43bf-bdcc-efeabcabc93c	context=97	namespace=dls-prod	pod=kyuubi-spark-e0eb4580-3cfa-43bf-bdcc-efeabcabc93c-driver	podState=Failed	containers=[microvault->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://66c42206730950bd422774e3c1b0f426d7879731788cea609bbfe0daab24a763, exitCode=2, finishedAt=2025-04-15T20:53:22Z, message=null, reason=Error, signal=null, startedAt=2025-04-15T20:52:00Z, additionalProperties={}), waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://9179a73d9d9e148dcd9c13ee6cc29dc3e257f95a33609065e061866bb611cb3b, exitCode=0, finishedAt=2025-04-15T20:52:28Z, message=null, reason=Completed, signal=null, startedAt=2025-04-15T20:52:01Z, additionalProperties={}), waiting=null, additionalProperties={})]	appId=spark-578df0facbfd4958a07f8d1ae79107dc	appState=FINISHED	appError=''
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7025 from turboFei/container_terminated.

Closes #7025

Closes #6686

a3b2a5a56 [Wang, Fei] comments
4356d1bc9 [Wang, Fei] fix the app state logical

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-16 10:12:10 -07:00
Wang, Fei
82e1673cae [KYUUBI #7026] Audit the kubernetes pod event type and fix DELETE event process logical
### Why are the changes needed?

1. Audit the kubernetes resource event type.
2. Fix the process logical for DELETE event.

Before this pr:

I tried to delete the POD manually, then I saw that, kyuubi thought the `appState=PENDING`.
```
:2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE	label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8	context=97	namespace=dls-prod	pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver	podState=Pending	containers=[]	appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad	appState=PENDING	appError=''
```

It seems that, the pod status in the event is the snapshot before pod deleted.

Then we would not receive any event for this POD, and finally the batch FINISHED with application `NOT_FOUND` .

<img width="1389" alt="image" src="https://github.com/user-attachments/assets/5df03db6-0924-4a58-9538-b196fbf87f32" />

Seems we need to process the DELETE event specially.

1. get the app state from the pod/container states
2. if the applicationState got is terminated, return the applicationState directly
3. otherwise, the applicationState should be FAILED, as the pod has been deleted.

### How was this patch tested?

<img width="1614" alt="image" src="https://github.com/user-attachments/assets/11e64c6f-ad53-4485-b8d2-a351bb23e8ca" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7026 from turboFei/k8s_audit.

Closes #7026

4e5695d34 [Wang, Fei] for delete
c16757218 [Wang, Fei] audit the pod event type

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-15 22:37:12 -07:00
Wang, Fei
4fc201e85d [KYUUBI #7027] Support to initialize kubernetes clients on kyuubi server startup
### Why are the changes needed?

This ensure the Kyuubi server is promptly informed for any Kubernetes resource changes after startup. It is highly recommend to set it for multiple Kyuubi instances mode.

### How was this patch tested?

Existing GA and Integration testing.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7027 from turboFei/k8s_client_init.

Closes #7027

393b9960a [Wang, Fei] server only
a640278c4 [Wang, Fei] refresh

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-15 22:36:16 -07:00
Wang, Fei
ec074fd202 [KYUUBI #7015] Record the session disconnected info into kyuubi session event
### Why are the changes needed?

Currently, if the kyuubi session between client and kyuubi session disconnected without closing properly, it is difficult to debug, and we have to check the kyuubi server log.

It is better that, we can record such kind of information into kyuubi session event.
### How was this patch tested?

IT.

<img width="1264" alt="image" src="https://github.com/user-attachments/assets/d2c5b6d0-6298-46ec-9b73-ce648551120c" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7015 from turboFei/disconnect.

Closes #7015

c95709284 [Wang, Fei] do not post
e46521410 [Wang, Fei] nit
bca7f9b7e [Wang, Fei] post
1cf6f8f49 [Wang, Fei] disconnect

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-10 00:09:33 -07:00
Wang, Fei
0cc52d035c [KYUUBI #7017] Using mutable JettyServer uri to prevent batch kyuubi instance mismatch
### Why are the changes needed?

To fix the batch kyuubi instance port is negative issue.
<img width="697" alt="image" src="https://github.com/user-attachments/assets/ef992390-8d20-44b3-8640-35496caff85d" />

It happen after I stop the kyuubi service.
We should use variable instead of function for jetty server serverUri.
After the server connector stopped, the localPort would be `-2`.

![image](https://github.com/user-attachments/assets/5152293d-9c2c-4979-bdcb-322f02928813)

### How was this patch tested?

Existing UT.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7017 from turboFei/server_port_negative.

Closes #7017

3d34c4031 [Wang, Fei] warn
e58298646 [Wang, Fei] mutable server uri
2cbaf772a [Wang, Fei] Revert "hard code the server uri"
b64d91b32 [Wang, Fei] hard code the server uri

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-09 10:27:09 -07:00
Wang, Fei
d6f07a6b64 [KYUUBI #7011] Set kyuubi session engine client after opening engine session successfully
### Why are the changes needed?

Since https://github.com/apache/kyuubi/pull/3618
Kyuubi server could retry opening the engine when encountering a special error.
1937dd93f9/kyuubi-server/src/main/scala/org/apache/kyuubi/session/KyuubiSessionImpl.scala (L177-L212)

The `_client` might be reset and closed.

So, we shall set `_client` after open engine session successfully, as the `client` method is a public method.
### How was this patch tested?

Existing UT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7011 from turboFei/client_ready.

Closes #7011

3ad57ee91 [Wang, Fei] fix npe
b956394fa [Wang, Fei] close internal engine client
523b48a4d [Wang, Fei] internal client
5baeedec1 [Wang, Fei] Revert "method"
84c808cfb [Wang, Fei] method
8efaa52f6 [Wang, Fei] check engine launched

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-04-03 09:41:27 -07:00
dependabot[bot]
560e0fbee1
Bump nanoid from 3.3.6 to 3.3.11 in /kyuubi-server/web-ui (#7001) 2025-03-25 14:44:57 +00:00
Cheng Pan
6459680d89
[KYUUBI #6998] [TEST] Harness SparkProcessBuilderSuite
### Why are the changes needed?

Fix the missing `assert` in `SparkProcessBuilderSuite - spark process builder`.

Fix the flaky test `SparkProcessBuilderSuite - capture error from spark process builder` by increasing `kyuubi.session.engine.startup.maxLogLines` from 10 to 4096, this is easy to fail, especially in Spark 4.0 due to increased error stack trace. for example, https://github.com/apache/kyuubi/actions/runs/13974413470/job/39290129824

```
SparkProcessBuilderSuite:
- spark process builder
- capture error from spark process builder *** FAILED ***
  The code passed to eventually never returned normally. Attempted 167 times over 1.5007926256666668 minutes. Last failure message: "org.apache.kyuubi.KyuubiSQLException: 	Suppressed: org.apache.spark.util.Utils$OriginalTryStackTraceException: Full stacktrace of original doTryWithCallerStacktrace caller
   See more: /home/runner/work/kyuubi/kyuubi/kyuubi-server/target/work/kentyao/kyuubi-spark-sql-engine.log.2
  	at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
  	at org.apache.kyuubi.engine.ProcBuilder.$anonfun$start$1(ProcBuilder.scala:239)
  	at java.base/java.lang.Thread.run(Thread.java:1583)
  .
  FYI: The last 10 line(s) of log are:
  25/03/24 12:53:39 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
  25/03/24 12:53:39 INFO MemoryStore: MemoryStore cleared
  25/03/24 12:53:39 INFO BlockManager: BlockManager stopped
  25/03/24 12:53:39 INFO BlockManagerMaster: BlockManagerMaster stopped
  25/03/24 12:53:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
  25/03/24 12:53:39 INFO SparkContext: Successfully stopped SparkContext
  25/03/24 12:53:39 INFO ShutdownHookManager: Shutdown hook called
  25/03/24 12:53:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-18455622-344e-48ac-92eb-4b368c35e697
  25/03/24 12:53:39 INFO ShutdownHookManager: Deleting directory /home/runner/work/kyuubi/kyuubi/kyuubi-server/target/work/kentyao/artifacts/spark-7479249b-44a2-4fe5-aa0f-544074f9c356
  25/03/24 12:53:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-5ba8250f-1ff2-4e0d-a365-27d7518308e1" did not contain "org.apache.hadoop.hive.ql.metadata.HiveException:". (SparkProcessBuilderSuite.scala:77)
```

### How was this patch tested?

Pass GHA, and verified locally with Spark 4.0.0 RC3 by running tests 10 times with constant success.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #6998 from pan3793/spark-pb-ut.

Closes #6998

a4290b413 [Cheng Pan] harness SparkProcessBuilderSuite

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-03-25 17:28:58 +08:00
Wang, Fei
196b47e32a [KYUUBI #6997] Get the latest batch app info after submit process terminated to prevent batch ERROR due to engine submit timeout
### Why are the changes needed?

We meet below issue:
For spark on yarn:
```
spark.yarn.submit.waitAppCompletion=false
kyuubi.engine.yarn.submit.timeout=PT10M
```

Due to network issue, the application submission was very slow.

It was submitted after 15 minutes.
<img width="1430" alt="image" src="https://github.com/user-attachments/assets/a326c3d1-4d39-42da-b6aa-cad5f8e7fc4b" />

<img width="1350" alt="image" src="https://github.com/user-attachments/assets/8e20056a-bd71-4515-a5e3-f881509a34b2" />

Then the batch failed from PENDING state to ERRO state directly, due to application state NOT_FOUND(exceeds the kyuubi.engine.yarn.submit.timeout).

a54ee39ab3/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala (L99-L106)

<img width="1727" alt="image" src="https://github.com/user-attachments/assets/20a2987c-675c-4136-a107-001f30b1b217" />

Here is the operation event:
<img width="1727" alt="image" src="https://github.com/user-attachments/assets/e2bab9c3-a959-4e2b-a207-813ae6489b30" />

But from the batch log, the current application status should be `PENDING`.
```
:2025-03-21 17:36:19.350 INFO [KyuubiSessionManager-exec-pool: Thread-176922] org.apache.kyuubi.operation.BatchJobSubmission: Batch report for bbba09c8-3704-4a87-8394-9bcbbd39cc34, Some(ApplicationInfo(application_1741747369441_2258235,6042072c-e8fa-425d-a6a3-3d5bbb4ec1e3-275732_6042072c-e8fa-425d-a6a3-3d5bbb4ec1e3-275732.e3a34b86-7fc7-43ea-b4a5-1b6f27df54b5.0_20250322002147.stm,PENDING,Some(https://apollo-rno-rm-2.vip.hadoop.ebay.com:50030/proxy/application_1741747369441_2258235/),Some()))
```

So, we should retrieve the batch application info after the submission process terminated before checking the application failed, to get the current application information to prevent the corner case:
1. the application submission time exceeds the `kyuubi.engine.yarn.submit.timeout` and the app state is NOT FOUND
2. can not get the application report before the submission process terminated
3. then the batch state to ERROR from PENDING directly.

Conclusion:

The application state transition was:

UNKNOWN(before submit timeout) -> NOT_FOUND(reach submit timeout) -> processExit -> batchOpError -> PENDING(updateApplicationInfoMetadataIfNeeded) -> UNKNOWN(batchError but app not terminated)

After this PR, it should be:

UNKNOWN(before submit timeout) -> NOT_FOUND(reach submit timeout) ->  processExit-> PENDING(after process terminated) -> ....

### How was this patch tested?

Existing GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6997 from turboFei/app_not_found_v2.

Closes #6997

370cf49e9 [Wang, Fei] v2
912ec28ca [Wang, Fei] nit
3c376f922 [Wang, Fei] log the op ex
d9cbdb87d [Wang, Fei] fix app not found

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-03-24 12:53:22 -07:00
Wang, Fei
338206e8a7 [KYUUBI #6785] Shutdown the executor service in KubernetesApplicationOperation and prevent NPE
# 🔍 Description
## Issue References 🔗

As title.

Fix NPE, because the cleanupTerminatedAppInfoTrigger will be set to `null`.
d3520ddbce/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala (L269)

Also shutdown the ExecutorService when KubernetesApplicationOperation stoped.
## Describe Your Solution 🔧

Shutdown the thread executor service and check the null.
## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6785 from turboFei/npe_k8s.

Closes #6785

6afd052e6 [Wang, Fei] comments
f0c3e3134 [Wang, Fei] prevent npe
9dffe0125 [Wang, Fei] shutdown

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-03-23 13:19:22 -07:00
Reese Feng
a54ee39ab3 [KYUUBI #6984] Fix ValueError when rendering MapType data
[
[KYUUBI #6984] Fix ValueError when rendering MapType data
](https://github.com/apache/kyuubi/issues/6984)

### Why are the changes needed?
The issue was caused by an incorrect iteration of MapType data in the `%table` magic command. When iterating over a `MapType` column, the code used `for k, v in m` directly, which leads to a `ValueError` because raw `Map` entries may not be properly unpacked

### How was this patch tested?
- [x] Manual testing:
  Executed a query with a `MapType` column and confirmed that the `%table` command now renders it without errors.
```python
 from pyspark.sql import SparkSession
 from pyspark.sql.types import MapType, StringType, IntegerType
 spark = SparkSession.builder \
     .appName("MapFieldExample") \
     .getOrCreate()

 data = [
     (1, {"a": "1", "b": "2"}),
     (2, {"x": "10"}),
     (3, {"key": "value"})
 ]

 schema = "id INT, map_col MAP<STRING, STRING>"
 df = spark.createDataFrame(data, schema=schema)
 df.printSchema()
 df2=df.collect()
```
using `%table` render table
```python
 %table df2
```

result
```python
{'application/vnd.livy.table.v1+json': {'headers': [{'name': 'id', 'type': 'INT_TYPE'}, {'name': 'map_col', 'type': 'MAP_TYPE'}], 'data': [[1, {'a': '1', 'b': '2'}], [2, {'x': '10'}], [3, {'key': 'value'}]]}}

```

### Was this patch authored or co-authored using generative AI tooling?
No

**notice** This PR was co-authored by DeepSeek-R1.

Closes #6985 from JustFeng/patch-1.

Closes #6984

e0911ba94 [Reese Feng] Update PySparkTests for magic cmd
bc3ce1a49 [Reese Feng] Update PySparkTests for magic cmd
200d7ad9b [Reese Feng] Fix syntax error in dict iteration in magic_table_convert_map

Authored-by: Reese Feng <10377945+JustFeng@users.noreply.github.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-03-19 21:18:35 -07:00
dependabot[bot]
5ed33c6cb9
⬆️ Bump axios from 1.7.4 to 1.8.2 in /kyuubi-server/web-ui (#6967) 2025-03-10 10:57:07 +00:00
dependabot[bot]
dd9cc0ed4f
[KYUUBI #6814] [UI] Bump cross-spawn from 7.0.3 to 7.0.6
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from 7.0.3 to 7.0.6.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's changelog</a>.</em></p>
<blockquote>
<h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.5...v7.0.6">7.0.6</a> (2024-11-18)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>update cross-spawn version to 7.0.5 in package-lock.json (<a href="f700743918">f700743</a>)</li>
</ul>
<h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a> (2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>fix escaping bug introduced by backtracking (<a href="640d391fde">640d391</a>)</li>
</ul>
<h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a> (2024-11-07)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>) (<a href="5ff3a07d9a">5ff3a07</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="77cd97f3ca"><code>77cd97f</code></a> chore(release): 7.0.6</li>
<li><a href="6717de49ff"><code>6717de4</code></a> chore: upgrade standard-version</li>
<li><a href="f700743918"><code>f700743</code></a> fix: update cross-spawn version to 7.0.5 in package-lock.json</li>
<li><a href="9a7e3b2165"><code>9a7e3b2</code></a> chore: fix build status badge</li>
<li><a href="085268352d"><code>0852683</code></a> chore(release): 7.0.5</li>
<li><a href="640d391fde"><code>640d391</code></a> fix: fix escaping bug introduced by backtracking</li>
<li><a href="bff0c87c8b"><code>bff0c87</code></a> chore: remove codecov</li>
<li><a href="a7c6abc6fe"><code>a7c6abc</code></a> chore: replace travis with github workflows</li>
<li><a href="9b9246e096"><code>9b9246e</code></a> chore(release): 7.0.4</li>
<li><a href="5ff3a07d9a"><code>5ff3a07</code></a> fix: disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=7.0.3&new-version=7.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by yaooqinn.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/kyuubi/network/alerts).

</details>

Closes #6814 from dependabot[bot]/dependabot/npm_and_yarn/kyuubi-server/web-ui/cross-spawn-7.0.6.

Closes #6814

10dafbc6e [dependabot[bot]] ⬆️ Bump cross-spawn from 7.0.3 to 7.0.6 in /kyuubi-server/web-ui

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-01-23 20:01:00 +08:00
Cheng Pan
fff1841054
[KYUUBI #6876] Support rolling spark.kubernetes.file.upload.path
### Why are the changes needed?

The vanilla Spark neither support rolling nor expiration mechanism for `spark.kubernetes.file.upload.path`, if you use file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this directory from growing indefinitely.

This PR proposes to let `spark.kubernetes.file.upload.path` support placeholders `{{YEAR}}`, `{{MONTH}}` and `{{DAY}}` and introduce a switch `kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled` to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application.

For example, the user can configure the below configurations in `kyuubi-defaults.conf` to enable monthly rolling support for `spark.kubernetes.file.upload.path`
```
kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled=true
spark.kubernetes.file.upload.path=hdfs://hadoop-cluster/spark-upload-{{YEAR}}{{MONTH}}
```

Note that: spark would create sub dir `s"spark-upload-${UUID.randomUUID()}"` under the `spark.kubernetes.file.upload.path` for each uploading, the administer still needs to clean up the staging directory periodically.

For example:
```
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-f2b71340-dc1d-4940-89e2-c5fc31614eb4
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-173a8653-4d3e-48c0-b8ab-b7f92ae582d6
hdfs://hadoop-cluster/spark-upload-202501/spark-upload-3b22710f-a4a0-40bb-a3a8-16e481038a63
```

Administer can safely delete the `hdfs://hadoop-cluster/spark-upload-202412` after 20250101

### How was this patch tested?

New UTs are added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6876 from pan3793/rolling-upload.

Closes #6876

6614bf29c [Cheng Pan] comment
5d5cb3eb3 [Cheng Pan] docs
343adaefb [Cheng Pan] review
3eade8bc4 [Cheng Pan] fix
706989778 [Cheng Pan] docs
38953dc3f [Cheng Pan] Support rolling spark.kubernetes.file.upload.path

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-01-15 01:27:12 +08:00
Wang, Fei
26174278c5
[KYUUBI #6883] Using withOauthTokenProvider instead of withOauthToken to support token refresh
### Why are the changes needed?

Address comments: https://github.com/apache/kyuubi/discussions/6877#discussioncomment-11743818

> I guess this is a Kyuubi implementation issue, we just read the content from the kyuubi.kubernetes.authenticate.oauthTokenFile and call ConfigBuilder.withOauthToken, I guess this approach does not support token refresh...

### How was this patch tested?

Existing GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6883 from turboFei/k8s_token_provider.

Closes #6883

69dd28d27 [Wang, Fei] comments
a01040f94 [Wang, Fei] withOauthTokenProvider

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-01-15 01:25:34 +08:00
liupeiyue
a051253774
[KYUUBI #6843] Fix 'query-timeout-thread' thread leak
### Why are the changes needed?

see https://github.com/apache/kyuubi/issues/6843

If the session manager's ThreadPoolExecutor refuses to execute asyncOperation,   then we need to shut down the query-timeout-thread in the catch

### How was this patch tested?

 1 Use jstack to view threads on the long-lived engine side
![image](https://github.com/user-attachments/assets/95d3a897-001d-4250-bf13-172b6997021b)

 2  Wait for all SQL statements in the engine to finish executing, and then use stack to check the number of query-timeout-thread threads, which should be empty.
![image](https://github.com/user-attachments/assets/0afbc026-7dd3-4594-afd2-92a5ef23f6cb)

### Was this patch authored or co-authored using generative AI tooling?

NO

Closes #6844 from ASiegeLion/master.

Closes #6843

9107a300e [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak
4b3417f21 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak
ef1f66bb5 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak
9e1a015f6 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak
78a9fde09 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak

Authored-by: liupeiyue <liupeiyue@yy.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-12-27 18:02:16 +08:00
Wang, Fei
164df8d466 [KYUUBI #6866][FOLLOWUP] Prevent register gauge conflicts if both thrift binary SSL and thrift http SSL enabled
### Why are the changes needed?

Followup for https://github.com/apache/kyuubi/pull/6866
It would throw exception if both thrift binary SSL and thrift http SSL enabled

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6872 from turboFei/duplicate_gauge.

Closes #6866

ea356766e [Wang, Fei] prevent conflicts
982f175fd [Wang, Fei] conflicts

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2024-12-26 18:22:48 -08:00
Wang, Fei
53034a3a14
[KYUUBI #6866] Add metrics for SSL keystore expiration time
### Why are the changes needed?

Add metrics for SSL keystore expiration, then we can add alert if the keystore will expire in 1 month.

### How was this patch tested?

Integration testing.
<img width="1721" alt="image" src="https://github.com/user-attachments/assets/f4ef6af6-923b-403c-a80d-06dbb80dbe1c" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6866 from turboFei/keystore_expire.

Closes #6866

77c6db0a7 [Wang, Fei] Add metrics for SSL keystore expiration time #6866

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-12-26 14:04:05 +08:00
Wang, Fei
3167692732
[KYUUBI #6829] Add metrics for batch pending max elapse time
### Why are the changes needed?

1. add metrics `kyuubi.operartion.batch_pending_max_elapse` for the batch pending max elapse time, which is helpful for batch health monitoring, and we can send alert if the batch pending elapse time too long
2. For `GET /api/v1/batches` api, limit the max time window for listing batches, which is helpful that, we want to reserve more metadata in kyuubi server end, for example: 90 days, but for list batches, we just want to allow user to search the last 7 days. It is optional. And if `create_time` is specified, order by `create_time` instead of `key_id`.
68a6f48da5/kyuubi-server/src/main/resources/sql/mysql/metadata-store-schema-1.8.0.mysql.sql (L32)

### How was this patch tested?

GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6829 from turboFei/batch_pending_time.

Closes #6829

ee4f93125 [Wang, Fei] docs
bf8169ad4 [Wang, Fei] comments
f493a2af8 [Wang, Fei] new config
ab7b6db65 [Wang, Fei] ut
168017587 [Wang, Fei] in memory session
510a30b6a [Wang, Fei] batchSearchWindow opt
1e93dd276 [Wang, Fei] save

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-12-05 18:12:39 +08:00
Joao Amaral
27c734ed95 [KYUUBI #6722] Fix AppState when Engine connection is terminated
# 🔍 Description
## Issue References 🔗

This issue was noticed a few times when the batch `state` was `set` to `ERROR`, but the `appState` kept the non-terminal state forever (e.g. `RUNNING`), even if the application was finished (in this case Yarn Application).

```json
{
"id": "********",
"user": "****",
"batchType": "SPARK",
"name": "*********",
"appStartTime": 0,
"appId": "********",
"appUrl": "********",
"appState": "RUNNING",
"appDiagnostic": "",
"kyuubiInstance": "*********",
"state": "ERROR",
"createTime": 1725343207318,
"endTime": 1725343300986,
"batchInfo": {}
}
```

It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to `UNKNOWN` state to avoid errors.

## Describe Your Solution 🔧

This is a simple fix that only checks if the batch state is `ERROR` and the appState is not in a terminal state and changes the `appState` to `UNKNOWN`, in these cases (during the batch metadata update).

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR` state and the application keeps the last know state (e.g. RUNNING).

#### Behavior With This Pull Request 🎉

If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR `state and the application has a non-terminal state, it is forced to `UNKNOWN` state.

#### Related Unit Tests

I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g. `YarnClient.getApplication`) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine.

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6722 from joaopamaral/fix/app-state-on-batch-error.

Closes #6722

8409eacac [Wang, Fei] fix
da8c356a7 [Joao Amaral] format fix
73b77b3f7 [Joao Amaral] use isTerminated
64f96a256 [Joao Amaral] Remove test
1eb80ef73 [Joao Amaral] Remove test
13498fa6b [Joao Amaral] Remove test
60ce55ef3 [Joao Amaral] add todo
3a3ba162b [Joao Amaral] Fix
215ac665f [Joao Amaral] Fix AppState when Engine connection is terminated

Lead-authored-by: Joao Amaral <7281460+joaopamaral@users.noreply.github.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2024-11-22 22:10:57 -08:00
senmiaoliu
c9d9433f74 [KYUUBI #6787] Improve the compatibility of queryTimeout in more version clients
# 🔍 Description
## Issue References 🔗

This pull request fixes #2112

## Describe Your Solution 🔧

Similar to #2113, the query-timeout-thread should verify the Thrift protocol version. For protocol versions <= HIVE_CLI_SERVICE_PROTOCOL_V8, it should convert TIMEDOUT_STATE to CANCELED.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6787 from lsm1/branch-timer-checker-set-cancel.

Closes #6787

9fbe1ac97 [senmiaoliu] add isHive21OrLower method
0c77c6f6f [senmiaoliu] time checker set cancel state

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: senmiaoliu <senmiaoliu@trip.com>
2024-11-04 19:12:48 +08:00
Bowen Liang
d3520ddbce [KYUUBI #6769] [RELEASE] Bump 1.11.0-SNAPSHOT
# 🔍 Description
## Issue References 🔗

This pull request fixes #

## Describe Your Solution 🔧

Preparing v1.11.0-SNAPSHOT after branch-1.10 cut

```shell
build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="1.11.0-SNAPSHOT"
(cd kyuubi-server/web-ui && npm version "1.11.0-SNAPSHOT")
```

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6769 from bowenliang123/bump-1.11.

Closes #6769

6db219d28 [Bowen Liang] get latest_branch by sorting version in branch name
465276204 [Bowen Liang] update package.json
81f2865e5 [Bowen Liang] bump

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-23 17:10:56 +08:00
senmiaoliu
f876600c4a [KYUUBI #6772] Fix ProcessBuilder to properly handle Java opts as a list
# 🔍 Description
## Issue References 🔗

## Describe Your Solution 🔧

This PR addresses an issue in the ProcessBuilder class where Java options passed as a single string (e.g., "-Dxxx -Dxxx") do not take effect. The command list must separate these options into individual elements to ensure they are recognized correctly by the Java runtime.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6772 from lsm1/branch-fix-processBuilder.

Closes #6772

fb6d53234 [senmiaoliu] fix process builder java opts

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-23 13:28:58 +08:00
wforget
1e9d68b000 [KYUUBI #6368] Flink engine supports user impersonation
# 🔍 Description
## Issue References 🔗

This pull request fixes #6368

## Describe Your Solution 🔧

Support impersonation mode for flink sql engine.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

Test in hadoop-testing env.

Connection:

```
beeline -u "jdbc:hive2://hadoop-master1.orb.local:10009/default;hive.server2.proxy.user=spark;principal=kyuubi/_HOSTTEST.ORG?kyuubi.engine.type=FLINK_SQL;flink.execution.target=yarn-application;kyuubi.engine.share.level=CONNECTION;kyuubi.engine.flink.doAs.enabled=true;"
```

sql:

```
select 1;
```

result:

![image](https://github.com/apache/kyuubi/assets/17894939/4bde3e4e-0dac-4e09-ac7c-a2c3a3607a13)

launch engine command:

```
2024-06-12 03:22:10.242 INFO KyuubiSessionManager-exec-pool: Thread-62 org.apache.kyuubi.engine.EngineRef: Launching engine:
/opt/flink-1.18.1/bin/flink run-application \
	-t yarn-application \
	-Dyarn.ship-files=/opt/flink/opt/flink-sql-client-1.18.1.jar;/opt/flink/opt/flink-sql-gateway-1.18.1.jar;/etc/hive/conf/hive-site.xml \
	-Dyarn.application.name=kyuubi_CONNECTION_FLINK_SQL_spark_6170b9aa-c690-4b50-938f-d59cca9aa2d6 \
	-Dyarn.tags=KYUUBI,6170b9aa-c690-4b50-938f-d59cca9aa2d6 \
	-Dcontainerized.master.env.FLINK_CONF_DIR=. \
	-Dcontainerized.master.env.HIVE_CONF_DIR=. \
	-Dyarn.security.appmaster.delegation.token.services=kyuubi \
	-Dsecurity.delegation.token.provider.HiveServer2.enabled=false \
	-Dsecurity.delegation.token.provider.hbase.enabled=false \
	-Dexecution.target=yarn-application \
	-Dsecurity.module.factory.classes=org.apache.flink.runtime.security.modules.JaasModuleFactory;org.apache.flink.runtime.security.modules.ZookeeperModuleFa
ctory \
	-Dsecurity.delegation.token.provider.hadoopfs.enabled=false \
	-c org.apache.kyuubi.engine.flink.FlinkSQLEngine /opt/apache-kyuubi-1.10.0-SNAPSHOT-bin/externals/engines/flink/kyuubi-flink-sql-engine_2.12-1.10.0-SNAPS
HOT.jar \
	--conf kyuubi.session.user=spark \
	--conf kyuubi.client.ipAddress=172.20.0.5 \
	--conf kyuubi.engine.credentials=SERUUwACJnRocmlmdDovL2hhZG9vcC1tYXN0ZXIxLm9yYi5sb2NhbDo5MDgzRQAFc3BhcmsEaGl2ZShreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiLmxvY2Fs
QFRFU1QuT1JHigGQCneevIoBkC6EIrwWDxSg03pnAB8dA295wh+Dim7Fx4FNxhVISVZFX0RFTEVHQVRJT05fVE9LRU4ADzE3Mi4yMC4wLjU6ODAyMEEABXNwYXJrAChreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiL
mxvY2FsQFRFU1QuT1JHigGQCneekIoBkC6EIpBHHBSket0SQnlXT5EIMN0U2fUKFRIVvBVIREZTX0RFTEVHQVRJT05fVE9LRU4PMTcyLjIwLjAuNTo4MDIwAA== \
	--conf kyuubi.engine.flink.doAs.enabled=true \
	--conf kyuubi.engine.hive.extra.classpath=/opt/hadoop/share/hadoop/client/*:/opt/hadoop/share/hadoop/mapreduce/* \
	--conf kyuubi.engine.share.level=CONNECTION \
	--conf kyuubi.engine.submit.time=1718162530017 \
	--conf kyuubi.engine.type=FLINK_SQL \
	--conf kyuubi.frontend.protocols=THRIFT_BINARY,REST \
	--conf kyuubi.ha.addresses=hadoop-master1.orb.local:2181 \
	--conf kyuubi.ha.engine.ref.id=6170b9aa-c690-4b50-938f-d59cca9aa2d6 \
	--conf kyuubi.ha.namespace=/kyuubi_1.10.0-SNAPSHOT_CONNECTION_FLINK_SQL/spark/6170b9aa-c690-4b50-938f-d59cca9aa2d6 \
	--conf kyuubi.server.ipAddress=172.20.0.5 \
	--conf kyuubi.session.connection.url=hadoop-master1.orb.local:10009 \
	--conf kyuubi.session.engine.startup.waitCompletion=false \
	--conf kyuubi.session.real.user=spark
```

launch engine log:

![image](https://github.com/apache/kyuubi/assets/17894939/590463a8-2858-47a2-8897-0ddfbe3ffdf6)

jobmanager job:

```
2024-06-12 03:22:26,400 INFO  org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - Loading delegation token providers
2024-06-12 03:22:26,992 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenProvider [] - Renew delegation token with engine credentials: SERUUwACJnRocmlmdDovL2hhZG9vcC1tYXN0ZXIxLm9yYi5sb2NhbDo5MDgzRQAFc3BhcmsEaGl2ZShreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiLmxvY2FsQFRFU1QuT1JHigGQCneevIoBkC6EIrwWDxSg03pnAB8dA295wh+Dim7Fx4FNxhVISVZFX0RFTEVHQVRJT05fVE9LRU4ADzE3Mi4yMC4wLjU6ODAyMEEABXNwYXJrAChreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiLmxvY2FsQFRFU1QuT1JHigGQCneekIoBkC6EIpBHHBSket0SQnlXT5EIMN0U2fUKFRIVvBVIREZTX0RFTEVHQVRJT05fVE9LRU4PMTcyLjIwLjAuNTo4MDIwAA==
2024-06-12 03:22:27,100 INFO  org.apache.kyuubi.engine.flink.FlinkEngineUtils              [] - Add new unknown token Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 05 73 70 61 72 6b 04 68 69 76 65 28 6b 79 75 75 62 69 2f 68 61 64 6f 6f 70 2d 6d 61 73 74 65 72 31 2e 6f 72 62 2e 6c 6f 63 61 6c 40 54 45 53 54 2e 4f 52 47 8a 01 90 0a 77 9e bc 8a 01 90 2e 84 22 bc 16 0f
2024-06-12 03:22:27,104 WARN  org.apache.kyuubi.engine.flink.FlinkEngineUtils              [] - Ignore token with earlier issue date: Kind: HDFS_DELEGATION_TOKEN, Service: 172.20.0.5:8020, Ident: (token for spark: HDFS_DELEGATION_TOKEN owner=spark, renewer=, realUser=kyuubi/hadoop-master1.orb.localTEST.ORG, issueDate=1718162529936, maxDate=1718767329936, sequenceNumber=71, masterKeyId=28)
2024-06-12 03:22:27,104 INFO  org.apache.kyuubi.engine.flink.FlinkEngineUtils              [] - Update delegation tokens. The number of tokens sent by the server is 2. The actual number of updated tokens is 1.
......
4-06-12 03:22:29,414 INFO  org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - Starting tokens update task
2024-06-12 03:22:29,415 INFO  org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - New delegation tokens arrived, sending them to receivers
2024-06-12 03:22:29,422 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updating delegation tokens for current user
2024-06-12 03:22:29,422 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[10, 13, 10, 9, 8, 10, 16, -78, -36, -49, -17, -5, 49, 16, 1, 16, -100, -112, -60, -127, -8, -1, -1, -1, -1, 1]
2024-06-12 03:22:29,422 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[0, 5, 115, 112, 97, 114, 107, 4, 104, 105, 118, 101, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -68, -118, 1, -112, 46, -124, 34, -68, 22, 15]
2024-06-12 03:22:29,422 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service:172.20.0.5:8020 Identifier:[0, 5, 115, 112, 97, 114, 107, 0, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -112, -118, 1, -112, 46, -124, 34, -112, 71, 28]
2024-06-12 03:22:29,422 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updated delegation tokens for current user successfully

```

taskmanager log:

```
2024-06-12 03:45:06,622 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Receive initial delegation tokens from resource manager
2024-06-12 03:45:06,627 INFO  org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - New delegation tokens arrived, sending them to receivers
2024-06-12 03:45:06,628 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updating delegation tokens for current user
2024-06-12 03:45:06,629 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[10, 13, 10, 9, 8, 10, 16, -78, -36, -49, -17, -5, 49, 16, 1, 16, -100, -112, -60, -127, -8, -1, -1, -1, -1, 1]
2024-06-12 03:45:06,630 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[0, 5, 115, 112, 97, 114, 107, 4, 104, 105, 118, 101, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -68, -118, 1, -112, 46, -124, 34, -68, 22, 15]
2024-06-12 03:45:06,630 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service:172.20.0.5:8020 Identifier:[0, 5, 115, 112, 97, 114, 107, 0, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -112, -118, 1, -112, 46, -124, 34, -112, 71, 28]
2024-06-12 03:45:06,636 INFO  org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updated delegation tokens for current user successfully
2024-06-12 03:45:06,636 INFO  org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - Delegation tokens sent to receivers
```

#### Related Unit Tests

---

# Checklist 📝

- [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6383 from wForget/KYUUBI-6368.

Closes #6368

47df43ef0 [wforget] remove doAsEnabled
984b96c74 [wforget] update settings.md
c7f8d474e [wforget] make generateTokenFile conf to internal
8632176b1 [wforget] address comments
2ec270e8a [wforget] licenses
ed0e22f4e [wforget] separate kyuubi-flink-token-provider module
b66b855b6 [wforget] address comment
d4fc2bd1d [wforget] fix
1a3dc4643 [wforget] fix style
825e2a7a0 [wforget] address comments
a679ba1c2 [wforget] revert remove renewer
cdd499b95 [wforget] fix and comment
19caec6c0 [wforget] pass token to submit process
b2991d419 [wforget] fix
7c3bdde1b [wforget] remove security.delegation.tokens.enabled check
8987c9176 [wforget] fix
5bd8cfe7c [wforget] fix
08992642d [wforget] Implement KyuubiDelegationToken Provider/Receiver
fa16d7def [wforget] enable delegation token manager
e50db7497 [wforget] [KYUUBI #6368] Support impersonation mode for flink sql engine

Authored-by: wforget <643348094@qq.com>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-21 17:32:39 +08:00
Bowen Liang
fb65a12936 [KYUUBI #6756] [REST] Check max file size of uploaded resource and extra resources in batch creation
# 🔍 Description
## Issue References 🔗

This pull request fixes #

## Describe Your Solution 🔧

Check the uploaded resource files when creating batch via REST API
- add config `kyuubi.batch.resource.file.max.size` for resource file's max size in bytes
- add config `kyuubi.batch.extra.resource.file.max.size` for each extra resource file's max size in bytes

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6756 from bowenliang123/resource-maxsize.

Closes #6756

5c409c425 [Bowen Liang] nit
4b16bcfc4 [Bowen Liang] nit
743920d25 [Bowen Liang] check resource file size max size

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-21 16:04:33 +08:00
wforget
535c4f90bc
[KYUUBI #6753] Set hadoop fs delegation token renewer to empty
# 🔍 Description
## Issue References 🔗

Allow delegation tokens to be used and renewed by yarn resourcemanager. (used in proxy user mode of flink engine, address https://github.com/apache/kyuubi/pull/6383#discussion_r1635768060)

## Describe Your Solution 🔧

Set hadoop fs delegation token renewer to empty.

## Types of changes 🔖

- [X] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6753 from wForget/renewer.

Closes #6753

f2e1f0aa1 [wforget] Set hadoop fs delegation token renewer to empty

Authored-by: wforget <643348094@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-10-18 10:56:33 +08:00
Wang, Fei
cefb98d27b [KYUUBI #6750] [REST] Using ForbiddenException instead of NotAllowedException
# 🔍 Description
## Issue References 🔗

Seems NotAllowedException is used for method not allowed, and currently, we use false constructor, the error message we expected would not be return to client end.

It only told:
```
{"message":"HTTP 405 Method Not Allowed"}
```
Because the message we used to build the NotAllowedException was treated as `allowed` method, not as `message`.

![image](https://github.com/user-attachments/assets/3199f20c-6148-4e6a-9183-7a0843913d8d)

## Describe Your Solution 🔧

We should use the ForbidenException instead, and then the error message we excepted can be visible in client end.

85dd5a52ef/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/api.scala (L47-L51)

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

<img width="913" alt="image" src="https://github.com/user-attachments/assets/6c4e836d-a47a-485d-85a3-fd3a35a9e425">

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6750 from turboFei/not_allowed_exception.

Closes #6750

4dd6fc18c [Wang, Fei] Using ForbiddenException instead of NotAllowedException

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-18 09:20:52 +08:00
Bowen Liang
928e3d243f [KYUUBI #6731] [REST] Check all required extra resource files uploaded in creating batch request
# 🔍 Description
## Issue References 🔗

This pull request fixes #

## Describe Your Solution 🔧

- check all the required extra resource files are uploaded in POST multi-part request as expected, when creating batch with REST Batch API

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6731 from bowenliang123/extra-resource-check.

Closes #6731

116a47ea5 [Bowen Liang] update
cd4433a8c [Bowen Liang] update
4852b1569 [Bowen Liang] update
5bb2955e8 [Bowen Liang] update
1696e7328 [Bowen Liang] update
911a9c195 [Bowen Liang] update
042e42d23 [Bowen Liang] update
56dc7fb8a [Bowen Liang] update

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-18 08:44:13 +08:00