kyuubi

Author	SHA1	Message	Date
Wang, Fei	e769f42398	[KYUUBI #6884 ] [FEATURE] Support to reassign the batches to alternative kyuubi instance in case kyuubi instance lost ### Why are the changes needed? Support to reassign the batches to alternative kyuubi instance in case kyuubi instance lost. https://github.com/apache/kyuubi/issues/6884 ### How was this patch tested? Unit Test ### Was this patch authored or co-authored using generative AI tooling? No Closes #7037 from George314159/6884. Closes #6884 8565d4aaa [Wang, Fei] KYUUBI_SESSION_CONNECTION_URL_KEY 22d4539e2 [Wang, Fei] admin 075654cb3 [Wang, Fei] check admin 5654a99f4 [Wang, Fei] log and lock a19e2edf5 [Wang, Fei] minor comments a60f23ba3 [George314159] refine 760e10f89 [George314159] Update Based On Comments 75f1ee2a9 [Fei Wang] ping (#1) f42bcaf9a [George314159] Update Based on Comments 1bea70ed6 [George314159] [KYUUBI-6884] Support to reassign the batches to alternative kyuubi instance in case kyuubi instance lost Lead-authored-by: Wang, Fei <fwang12@ebay.com> Co-authored-by: George314159 <hua16732@gmail.com> Co-authored-by: Fei Wang <cn.feiwang@gmail.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-06-22 22:36:51 -07:00
Wang, Fei	302b5fa1e6	[KYUUBI #7101 ] Load the existing pods when initializing kubernetes client to cleanup terminated app pods ### Why are the changes needed? To prevent the terminated app pods leak if the events missed during kyuubi server restart. ### How was this patch tested? Manual test. ``` :2025-06-17 17:50:37.275 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423211008-grectg-stm-17da59fe-caf4-41e4-a12f-6c1ed9a293f9-driver with label: kyuubi-unique-tag=17da59fe-caf4-41e4-a12f-6c1ed9a293f9 in app state FINISHED, marking it as terminated 2025-06-17 17:50:37.278 INFO [main] org.apache.kyuubi.engine.KubernetesApplicationOperation: [KubernetesInfo(Some(28),Some(dls-prod))] Found existing pod kyuubi-xb406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-5b406fc5-7b0b-4fdf-8531-929ed2ae250d-8998-90c0b328-930f-11ed-a1eb-0242ac120002-0-20250423212011-gpdtsi-stm-6a23000f-10be-4a42-ae62-4fa2da8fac07-driver with label: kyuubi-unique-tag=6a23000f-10be-4a42-ae62-4fa2da8fac07 in app state FINISHED, marking it as terminated ``` The pods are cleaned up eventually. <img width="664" alt="image" src="https://github.com/user-attachments/assets/8cf58f61-065f-4fb0-9718-2e3c00e8d2e0" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7101 from turboFei/pod_cleanup. Closes #7101 7f76cf57c [Wang, Fei] async 11c9db25d [Wang, Fei] cleanup Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-06-22 22:35:14 -07:00
Wang, Fei	bada9c0411	[KYUUBI #7095 ] Respect terminated app state when building batch info from metadata ### Why are the changes needed? Respect terminated app state when building batch info from metadata It is a followup for https://github.com/apache/kyuubi/pull/2911, `9e40e39c39/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala (L128-L142)` 1. if the kyuubi instance is unreachable during maintain window. 2. the batch app state has been terminated, and the app stated was backfilled by another kyuubi instance peer, see #2911 3. the batch state in the metadata table is still PENDING/RUNNING 4. return the terminated batch state for such case instead of `PENDING or RUNNING`. ### How was this patch tested? GA and IT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7095 from turboFei/always_respect_appstate. Closes #7095 ec72666c9 [Wang, Fei] rename bc74a9c56 [Wang, Fei] if op not terminated e786c8d9b [Wang, Fei] respect terminated app state when building batch info from metadata Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-06-12 10:20:44 -07:00
Wang, Fei	9e40e39c39	[KYUUBI #7093 ] Log the metadata cleanup count ### Why are the changes needed? To show how many metadata records cleaned up. ### How was this patch tested? ``` (base) ➜ kyuubi git:(delete_metadata) grep 'Cleaned up' target/unit-tests.log 01:58:17.109 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.124 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.144 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.161 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.180 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.199 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.216 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.236 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.253 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.270 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.290 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.310 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.327 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.348 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.368 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.384 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.400 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.419 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.437 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.456 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.475 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.493 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.513 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.533 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.551 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.569 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.590 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.611 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.631 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.651 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.668 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.688 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.705 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.725 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.744 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.764 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.784 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.801 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.822 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.849 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.870 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.889 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.910 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.929 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.948 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.970 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:17.994 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:18.014 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:18.032 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:18.050 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:18.069 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:18.086 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 0 records older than 1000 ms from metadata. 01:58:18.108 ScalaTest-run-running-JDBCMetadataStoreSuite INFO JDBCMetadataStore: Cleaned up 1 records older than 1000 ms from metadata. 01:58:18.162 ScalaTest-run INFO JDBCMetadataStore: Cleaned up 0 records older than 0 ms from k8s_engine_info. ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7093 from turboFei/delete_metadata. Closes #7093 e0cf300f8 [Wang, Fei] update Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-06-10 06:59:57 -07:00
Lennon Chin	cad5a392f3	[KYUUBI #7072 ] Expose metrics of engine startup permit state ### Why are the changes needed? The metrics `kyuubi_operation_state_LaunchEngine_*` cannot reflect the state of Semaphore after configuring the maximum engine startup limit through `kyuubi.server.limit.engine.startup`, add some metrics to show the relevant permit state. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? Closes #7072 from LennonChin/engine_startup_metrics. Closes #7072 d6bf3696a [Lennon Chin] Expose metrics of engine startup permit status Authored-by: Lennon Chin <i@coderap.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-05-29 13:27:42 +08:00
taylor.fan	127c736a8f	[KYUUBI #6926 ] Add SERVER_LOCAL engine share level ### Why are the changes needed? As clarified in https://github.com/apache/kyuubi/issues/6926, there are some scenarios user want to launch engine on each kyuubi server. SERVER_LOCAL engine share level implement this function by extracting local host address as subdomain, in which case each kyuubi server's engine is unique. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No Closes #7013 from taylor12805/share_level_server_local. Closes #6926 ba201bb72 [taylor.fan] [KYUUBI #6926] update format 42f0a4f7d [taylor.fan] [KYUUBI #6926] move host address to subdomain e06de79ad [taylor.fan] [KYUUBI #6926] Add SERVER_LOCAL engine share level Authored-by: taylor.fan <taylor.fan@vipshop.com> Signed-off-by: Kent Yao <yao@apache.org>	2025-04-29 10:42:50 +08:00
Wang, Fei	ecfca79328	[KYUUBI #7033 ] Treat YARN/Kubernetes application NOT_FOUND as failed to prevent data quality issue ### Why are the changes needed? Currently, NOT_FOUND application stated is treated as a terminated but not failed state. It might cause some data quality issue if downstream application depends on the batch state for data processing. So, I think we should treat NOT_FOUND as a failed state instead. Currently, we support 3 types of application manager. 1. [JpsApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/JpsApplicationOperation.scala) 2. [YarnApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/YarnApplicationOperation.scala) 3. [KubernetesApplicationOperation](https://github.com/apache/kyuubi/blob/master/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala) YarnApplicationOperation and KubernetesApplicationOperation are widely used in production use case. And in multiple kyuubi instance mode, the NOT_FOUND case should rarely happen. 1. `7e199d6fdb/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/v1/BatchesResource.scala (L369-L385)` 3. https://github.com/apache/kyuubi/pull/7029 So, I think we should treat NOT_FOUND as a failed state in production use case. It is better to fail some corner cases than to mistakenly set unsuccessful batches to the finished state. ### How was this patch tested? GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7033 from turboFei/revist_not_found. Closes #7033 ada4f8822 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala 985e23c24 [Wang, Fei] Refine f03d61242 [Wang, Fei] comments b9d6ac203 [Wang, Fei] incase the metadata updated by peer instance 3bd61ca85 [Wang, Fei] add 339df4730 [Wang, Fei] treat NOT_FOUND as failed Lead-authored-by: Wang, Fei <fwang12@ebay.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-04-27 21:09:08 +08:00
Wang, Fei	02a6b13d77	[KYUUBI #7028 ] Persist the kubernetes application terminate state into metastore for app info store fallback ### Why are the changes needed? 1. Persist the kubernetes application terminate info into metastore to prevent the event lose. 2. If it can not get the application info from informer application info store, fallback to get the application info from metastore instead of return NOT_FOUND directly. 3. It is critical because if we return false application state, it might cause data quality issue. ### How was this patch tested? UT and IT. <img width="1917" alt="image" src="https://github.com/user-attachments/assets/306f417c-5037-4869-904d-dcf657ff8f60" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7029 from turboFei/kubernetes_state. Closes #7028 9f2badef3 [Wang, Fei] generic dialect 186cc690d [Wang, Fei] nit 82ea62669 [Wang, Fei] Add pod name 4c59bebb5 [Wang, Fei] Refine 327a0d594 [Wang, Fei] Remove create_time from k8s engine info 12c24b1d0 [Wang, Fei] do not use MYSQL deprecated VALUES(col) becf9d1a7 [Wang, Fei] insert or replace d167623c1 [Wang, Fei] migration Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-27 01:37:27 -07:00
Wang, Fei	4cbff4d192	[KYUUBI #7045 ] Expose jetty metrics ### Why are the changes needed? Expose the jetty metrics to help detect issue. Refer: https://metrics.dropwizard.io/4.2.0/manual/jetty.html ### How was this patch tested? <img width="1425" alt="image" src="https://github.com/user-attachments/assets/ac8c9a48-eaa1-48ee-afec-6f33980d4270" /> <img width="1283" alt="image" src="https://github.com/user-attachments/assets/c2fa444b-6337-4662-832b-3d02f206bd13" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7045 from turboFei/metrics_jetty. Closes #7045 122b93f3d [Wang, Fei] metrics 45a73e7cd [Wang, Fei] metrics Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-25 00:02:56 -07:00
Wang, Fei	29b6076319	[KYUUBI #7043 ] Support to construct the batch info from metadata directly ### Why are the changes needed? Add an option to allow construct the batch info from metadata directly instead of redirecting the requests to reduce the RPC latency. ### How was this patch tested? Minor change and Existing GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7043 from turboFei/support_no_redirect. Closes #7043 7f7a2fb80 [Wang, Fei] comments bb0e324a1 [Wang, Fei] save Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-24 22:42:26 -07:00
Wang, Fei	75891d1a92	[KYUUBI #7034 ][FOLLOWUP] Decouple the kubernetes pod name and app name ### Why are the changes needed? Followup for #7034 to fix the SparkOnKubernetesTestsSuite. Sorry, I forget that the appInfo name and pod name were deeply bound before, the appInfo name was used as pod name and used to delete pod. In this PR, we add `podName` into applicationInfo to separate app name and pod name. ### How was this patch tested? GA should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7039 from turboFei/fix_test. Closes #7034 0ff7018d6 [Wang, Fei] revert 18e48c079 [Wang, Fei] comments 19f34bc83 [Wang, Fei] do not get pod name from appName c1d308437 [Wang, Fei] reduce interval for test stability 50fad6bc5 [Wang, Fei] fix ut Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-24 22:40:28 -07:00
Wang, Fei	ba854d3c99	[KYUUBI #7035 ] Close the operation by operation manager to prevent operation leak ### Why are the changes needed? To fix the operation leak if the session init timeout. 1. the operationHandle has not been added into session `opHandleSet` (super.runOperation). 2. The `operation.close()` only close the operation, but it does not remove it from `handleToOperation` map. 3. the session close would not remove the opHandle from `handleToOperation` as it has not been added into session `opHandleSet` So here we can resolve the operation leak by invoking `operationManager.closeOperation` to remove the operation handle and close session. `cc68cb4c85/kyuubi-server/src/main/scala/org/apache/kyuubi/session/KyuubiSessionImpl.scala (L235-L246)` `cc68cb4c85/kyuubi-common/src/main/scala/org/apache/kyuubi/session/AbstractSession.scala (L100-L103)` `cc68cb4c85/kyuubi-common/src/main/scala/org/apache/kyuubi/operation/OperationManager.scala (L127-L130)` `cc68cb4c85/kyuubi-common/src/main/scala/org/apache/kyuubi/session/AbstractSession.scala (L89-L92)` FYI: the operation was added into `handleToOperation` during new operation. `cc68cb4c85/kyuubi-server/src/main/scala/org/apache/kyuubi/operation/KyuubiOperationManager.scala (L56-L64)` ### How was this patch tested? Minor change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7035 from turboFei/remove_op. Closes #7035 3c376833a [Wang, Fei] close by op mgr Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-04-24 14:14:07 +08:00
Wang, Fei	ee677a6feb	[KYUUBI #7041 ] Fix NPE when getting metadtamanager in KubernetesApplicationOperation ### Why are the changes needed? To fix NPE. Before, we use below method to get `metadataManager`. ``` private def metadataManager = KyuubiServer.kyuubiServer.backendService .sessionManager.asInstanceOf[KyuubiSessionManager].metadataManager ``` But before the kyuubi server fully restarted, the `KyuubiServer.kyuubiServer` is null and might throw NPE during batch recovery phase. For example: ``` :2025-04-23 14:06:24.040 ERROR [KyuubiSessionManager-exec-pool: Thread-231] org.apache.kyuubi.engine.KubernetesApplicationOperation: Failed to get application by label: kyuubi-unique-tag=95116703-4240-4cc1-9886-ccae3a2ac879, due to Cannot invoke "org.apache.kyuubi.server.KyuubiServer.backendService()" because the return value of "org.apache.kyuubi.server.KyuubiServer$.kyuubiServer()" is null ``` ### How was this patch tested? Existing GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7041 from turboFei/fix_NPE. Closes #7041 064d88707 [Wang, Fei] Fix NPE Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-23 20:20:39 -07:00
Wang, Fei	cc68cb4c85	[KYUUBI #7034 ] [KUBERNETES] Prefer to use pod `spark-app-name` label as application name than pod name ### Why are the changes needed? After https://github.com/apache/spark/pull/34460 (Since Spark 3.3.0), the `spark-app-name` is available. We shall use it as the application name if it exists. ### How was this patch tested? Minor change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7034 from turboFei/k8s_app_name. Closes #7034 bfa88a436 [Wang, Fei] Get pod app name Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-16 19:28:04 -07:00
Wang, Fei	7e199d6fdb	[KYUUBI #7025 ] [KYUUBI #6686 ][FOLLOWUP] Prefer terminated container app state than terminated pod state ### Why are the changes needed? I found that, for a kyuubi batch on kubernetes. 1. It has been `FINISHED`. 2. then I delete the pod manually, then I check the k8s-audit.log, then the appState became `FAILED`. ``` 2025-04-15 11:16:30.453 INFO [-675216314-pool-44-thread-839] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: label=61e7d8c1-e5a9-46cd-83e7-c611003f0224 context=97 namespace=dls-prod pod=kyuubi-spark-61e7d8c1-e5a9-46cd-83e7-c611003f0224-driver podState=Running containers=[microvault->ContainerState(running=ContainerStateRunning(startedAt=2025-04-15T18:13:48Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://72704f8e7ccb5e877c8f6b10bf6ad810d0c019e07e0cb5975be733e79762c1ec, exitCode=0, finishedAt=2025-04-15T18:14:22Z, message=null, reason=Completed, signal=null, startedAt=2025-04-15T18:13:49Z, additionalProperties={}), waiting=null, additionalProperties={})] appId=spark-228c62e0dc37402bacac189d01b871e4 appState=FINISHED appError='' :2025-04-15 11:16:30.854 INFO [-675216314-pool-44-thread-840] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: label=61e7d8c1-e5a9-46cd-83e7-c611003f0224 context=97 namespace=dls-prod pod=kyuubi-spark-61e7d8c1-e5a9-46cd-83e7-c611003f0224-driver podState=Failed containers=[microvault->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://91654e3ee74e2c31218e14be201b50a4a604c2ad15d3afd84dc6f620e59894b7, exitCode=2, finishedAt=2025-04-15T18:16:30Z, message=null, reason=Error, signal=null, startedAt=2025-04-15T18:13:48Z, additionalProperties={}), waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://72704f8e7ccb5e877c8f6b10bf6ad810d0c019e07e0cb5975be733e79762c1ec, exitCode=0, finishedAt=2025-04-15T18:14:22Z, message=null, reason=Completed, signal=null, startedAt=2025-04-15T18:13:49Z, additionalProperties={}), waiting=null, additionalProperties={})] appId=spark-228c62e0dc37402bacac189d01b871e4 appState=FAILED appError='{ ``` This PR is a followup for #6690 , which ignore the container state if POD is terminated. It is more reasonable to respect the terminated container state than terminated pod state. ### How was this patch tested? Integration testing. ``` :2025-04-15 13:53:24.551 INFO [-1077768163-pool-36-thread-3] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE label=e0eb4580-3cfa-43bf-bdcc-efeabcabc93c context=97 namespace=dls-prod pod=kyuubi-spark-e0eb4580-3cfa-43bf-bdcc-efeabcabc93c-driver podState=Failed containers=[microvault->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://66c42206730950bd422774e3c1b0f426d7879731788cea609bbfe0daab24a763, exitCode=2, finishedAt=2025-04-15T20:53:22Z, message=null, reason=Error, signal=null, startedAt=2025-04-15T20:52:00Z, additionalProperties={}), waiting=null, additionalProperties={}),spark-kubernetes-driver->ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://9179a73d9d9e148dcd9c13ee6cc29dc3e257f95a33609065e061866bb611cb3b, exitCode=0, finishedAt=2025-04-15T20:52:28Z, message=null, reason=Completed, signal=null, startedAt=2025-04-15T20:52:01Z, additionalProperties={}), waiting=null, additionalProperties={})] appId=spark-578df0facbfd4958a07f8d1ae79107dc appState=FINISHED appError='' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7025 from turboFei/container_terminated. Closes #7025 Closes #6686 a3b2a5a56 [Wang, Fei] comments 4356d1bc9 [Wang, Fei] fix the app state logical Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-16 10:12:10 -07:00
Wang, Fei	82e1673cae	[KYUUBI #7026 ] Audit the kubernetes pod event type and fix DELETE event process logical ### Why are the changes needed? 1. Audit the kubernetes resource event type. 2. Fix the process logical for DELETE event. Before this pr: I tried to delete the POD manually, then I saw that, kyuubi thought the `appState=PENDING`. ``` :2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8 context=97 namespace=dls-prod pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver podState=Pending containers=[] appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad appState=PENDING appError='' ``` It seems that, the pod status in the event is the snapshot before pod deleted. Then we would not receive any event for this POD, and finally the batch FINISHED with application `NOT_FOUND` . <img width="1389" alt="image" src="https://github.com/user-attachments/assets/5df03db6-0924-4a58-9538-b196fbf87f32" /> Seems we need to process the DELETE event specially. 1. get the app state from the pod/container states 2. if the applicationState got is terminated, return the applicationState directly 3. otherwise, the applicationState should be FAILED, as the pod has been deleted. ### How was this patch tested? <img width="1614" alt="image" src="https://github.com/user-attachments/assets/11e64c6f-ad53-4485-b8d2-a351bb23e8ca" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7026 from turboFei/k8s_audit. Closes #7026 4e5695d34 [Wang, Fei] for delete c16757218 [Wang, Fei] audit the pod event type Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-15 22:37:12 -07:00
Wang, Fei	4fc201e85d	[KYUUBI #7027 ] Support to initialize kubernetes clients on kyuubi server startup ### Why are the changes needed? This ensure the Kyuubi server is promptly informed for any Kubernetes resource changes after startup. It is highly recommend to set it for multiple Kyuubi instances mode. ### How was this patch tested? Existing GA and Integration testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7027 from turboFei/k8s_client_init. Closes #7027 393b9960a [Wang, Fei] server only a640278c4 [Wang, Fei] refresh Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-15 22:36:16 -07:00
Wang, Fei	ec074fd202	[KYUUBI #7015 ] Record the session disconnected info into kyuubi session event ### Why are the changes needed? Currently, if the kyuubi session between client and kyuubi session disconnected without closing properly, it is difficult to debug, and we have to check the kyuubi server log. It is better that, we can record such kind of information into kyuubi session event. ### How was this patch tested? IT. <img width="1264" alt="image" src="https://github.com/user-attachments/assets/d2c5b6d0-6298-46ec-9b73-ce648551120c" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7015 from turboFei/disconnect. Closes #7015 c95709284 [Wang, Fei] do not post e46521410 [Wang, Fei] nit bca7f9b7e [Wang, Fei] post 1cf6f8f49 [Wang, Fei] disconnect Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-10 00:09:33 -07:00
Wang, Fei	0cc52d035c	[KYUUBI #7017 ] Using mutable JettyServer uri to prevent batch kyuubi instance mismatch ### Why are the changes needed? To fix the batch kyuubi instance port is negative issue. <img width="697" alt="image" src="https://github.com/user-attachments/assets/ef992390-8d20-44b3-8640-35496caff85d" /> It happen after I stop the kyuubi service. We should use variable instead of function for jetty server serverUri. After the server connector stopped, the localPort would be `-2`. ![image](https://github.com/user-attachments/assets/5152293d-9c2c-4979-bdcb-322f02928813) ### How was this patch tested? Existing UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7017 from turboFei/server_port_negative. Closes #7017 3d34c4031 [Wang, Fei] warn e58298646 [Wang, Fei] mutable server uri 2cbaf772a [Wang, Fei] Revert "hard code the server uri" b64d91b32 [Wang, Fei] hard code the server uri Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-09 10:27:09 -07:00
Wang, Fei	d6f07a6b64	[KYUUBI #7011 ] Set kyuubi session engine client after opening engine session successfully ### Why are the changes needed? Since https://github.com/apache/kyuubi/pull/3618 Kyuubi server could retry opening the engine when encountering a special error. `1937dd93f9/kyuubi-server/src/main/scala/org/apache/kyuubi/session/KyuubiSessionImpl.scala (L177-L212)` The `_client` might be reset and closed. So, we shall set `_client` after open engine session successfully, as the `client` method is a public method. ### How was this patch tested? Existing UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7011 from turboFei/client_ready. Closes #7011 3ad57ee91 [Wang, Fei] fix npe b956394fa [Wang, Fei] close internal engine client 523b48a4d [Wang, Fei] internal client 5baeedec1 [Wang, Fei] Revert "method" 84c808cfb [Wang, Fei] method 8efaa52f6 [Wang, Fei] check engine launched Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-04-03 09:41:27 -07:00
dependabot[bot]	560e0fbee1	Bump nanoid from 3.3.6 to 3.3.11 in /kyuubi-server/web-ui (#7001 )	2025-03-25 14:44:57 +00:00
Cheng Pan	6459680d89	[KYUUBI #6998 ] [TEST] Harness SparkProcessBuilderSuite ### Why are the changes needed? Fix the missing `assert` in `SparkProcessBuilderSuite - spark process builder`. Fix the flaky test `SparkProcessBuilderSuite - capture error from spark process builder` by increasing `kyuubi.session.engine.startup.maxLogLines` from 10 to 4096, this is easy to fail, especially in Spark 4.0 due to increased error stack trace. for example, https://github.com/apache/kyuubi/actions/runs/13974413470/job/39290129824 ``` SparkProcessBuilderSuite: - spark process builder - capture error from spark process builder * FAILED * The code passed to eventually never returned normally. Attempted 167 times over 1.5007926256666668 minutes. Last failure message: "org.apache.kyuubi.KyuubiSQLException: Suppressed: org.apache.spark.util.Utils$OriginalTryStackTraceException: Full stacktrace of original doTryWithCallerStacktrace caller See more: /home/runner/work/kyuubi/kyuubi/kyuubi-server/target/work/kentyao/kyuubi-spark-sql-engine.log.2 at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69) at org.apache.kyuubi.engine.ProcBuilder.$anonfun$start$1(ProcBuilder.scala:239) at java.base/java.lang.Thread.run(Thread.java:1583) . FYI: The last 10 line(s) of log are: 25/03/24 12:53:39 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB 25/03/24 12:53:39 INFO MemoryStore: MemoryStore cleared 25/03/24 12:53:39 INFO BlockManager: BlockManager stopped 25/03/24 12:53:39 INFO BlockManagerMaster: BlockManagerMaster stopped 25/03/24 12:53:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 25/03/24 12:53:39 INFO SparkContext: Successfully stopped SparkContext 25/03/24 12:53:39 INFO ShutdownHookManager: Shutdown hook called 25/03/24 12:53:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-18455622-344e-48ac-92eb-4b368c35e697 25/03/24 12:53:39 INFO ShutdownHookManager: Deleting directory /home/runner/work/kyuubi/kyuubi/kyuubi-server/target/work/kentyao/artifacts/spark-7479249b-44a2-4fe5-aa0f-544074f9c356 25/03/24 12:53:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-5ba8250f-1ff2-4e0d-a365-27d7518308e1" did not contain "org.apache.hadoop.hive.ql.metadata.HiveException:". (SparkProcessBuilderSuite.scala:77) ``` ### How was this patch tested? Pass GHA, and verified locally with Spark 4.0.0 RC3 by running tests 10 times with constant success. ### Was this patch authored or co-authored using generative AI tooling? No Closes #6998 from pan3793/spark-pb-ut. Closes #6998 a4290b413 [Cheng Pan] harness SparkProcessBuilderSuite Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-03-25 17:28:58 +08:00
Wang, Fei	196b47e32a	[KYUUBI #6997 ] Get the latest batch app info after submit process terminated to prevent batch ERROR due to engine submit timeout ### Why are the changes needed? We meet below issue: For spark on yarn: ``` spark.yarn.submit.waitAppCompletion=false kyuubi.engine.yarn.submit.timeout=PT10M ``` Due to network issue, the application submission was very slow. It was submitted after 15 minutes. <img width="1430" alt="image" src="https://github.com/user-attachments/assets/a326c3d1-4d39-42da-b6aa-cad5f8e7fc4b" /> <img width="1350" alt="image" src="https://github.com/user-attachments/assets/8e20056a-bd71-4515-a5e3-f881509a34b2" /> Then the batch failed from PENDING state to ERRO state directly, due to application state NOT_FOUND(exceeds the kyuubi.engine.yarn.submit.timeout). `a54ee39ab3/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ApplicationOperation.scala (L99-L106)` <img width="1727" alt="image" src="https://github.com/user-attachments/assets/20a2987c-675c-4136-a107-001f30b1b217" /> Here is the operation event: <img width="1727" alt="image" src="https://github.com/user-attachments/assets/e2bab9c3-a959-4e2b-a207-813ae6489b30" /> But from the batch log, the current application status should be `PENDING`. ``` :2025-03-21 17:36:19.350 INFO [KyuubiSessionManager-exec-pool: Thread-176922] org.apache.kyuubi.operation.BatchJobSubmission: Batch report for bbba09c8-3704-4a87-8394-9bcbbd39cc34, Some(ApplicationInfo(application_1741747369441_2258235,6042072c-e8fa-425d-a6a3-3d5bbb4ec1e3-275732_6042072c-e8fa-425d-a6a3-3d5bbb4ec1e3-275732.e3a34b86-7fc7-43ea-b4a5-1b6f27df54b5.0_20250322002147.stm,PENDING,Some(https://apollo-rno-rm-2.vip.hadoop.ebay.com:50030/proxy/application_1741747369441_2258235/),Some())) ``` So, we should retrieve the batch application info after the submission process terminated before checking the application failed, to get the current application information to prevent the corner case: 1. the application submission time exceeds the `kyuubi.engine.yarn.submit.timeout` and the app state is NOT FOUND 2. can not get the application report before the submission process terminated 3. then the batch state to ERROR from PENDING directly. Conclusion: The application state transition was: UNKNOWN(before submit timeout) -> NOT_FOUND(reach submit timeout) -> processExit -> batchOpError -> PENDING(updateApplicationInfoMetadataIfNeeded) -> UNKNOWN(batchError but app not terminated) After this PR, it should be: UNKNOWN(before submit timeout) -> NOT_FOUND(reach submit timeout) -> processExit-> PENDING(after process terminated) -> .... ### How was this patch tested? Existing GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #6997 from turboFei/app_not_found_v2. Closes #6997 370cf49e9 [Wang, Fei] v2 912ec28ca [Wang, Fei] nit 3c376f922 [Wang, Fei] log the op ex d9cbdb87d [Wang, Fei] fix app not found Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-03-24 12:53:22 -07:00
Wang, Fei	338206e8a7	[KYUUBI #6785 ] Shutdown the executor service in KubernetesApplicationOperation and prevent NPE # 🔍 Description ## Issue References 🔗 As title. Fix NPE, because the cleanupTerminatedAppInfoTrigger will be set to `null`. `d3520ddbce/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala (L269)` Also shutdown the ExecutorService when KubernetesApplicationOperation stoped. ## Describe Your Solution 🔧 Shutdown the thread executor service and check the null. ## Types of changes 🔖 - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6785 from turboFei/npe_k8s. Closes #6785 6afd052e6 [Wang, Fei] comments f0c3e3134 [Wang, Fei] prevent npe 9dffe0125 [Wang, Fei] shutdown Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-03-23 13:19:22 -07:00
Reese Feng	a54ee39ab3	[KYUUBI #6984 ] Fix ValueError when rendering MapType data [ [KYUUBI #6984] Fix ValueError when rendering MapType data ](https://github.com/apache/kyuubi/issues/6984) ### Why are the changes needed? The issue was caused by an incorrect iteration of MapType data in the `%table` magic command. When iterating over a `MapType` column, the code used `for k, v in m` directly, which leads to a `ValueError` because raw `Map` entries may not be properly unpacked ### How was this patch tested? - [x] Manual testing: Executed a query with a `MapType` column and confirmed that the `%table` command now renders it without errors. ```python from pyspark.sql import SparkSession from pyspark.sql.types import MapType, StringType, IntegerType spark = SparkSession.builder \ .appName("MapFieldExample") \ .getOrCreate() data = [ (1, {"a": "1", "b": "2"}), (2, {"x": "10"}), (3, {"key": "value"}) ] schema = "id INT, map_col MAP<STRING, STRING>" df = spark.createDataFrame(data, schema=schema) df.printSchema() df2=df.collect() ``` using `%table` render table ```python %table df2 ``` result ```python {'application/vnd.livy.table.v1+json': {'headers': [{'name': 'id', 'type': 'INT_TYPE'}, {'name': 'map_col', 'type': 'MAP_TYPE'}], 'data': [[1, {'a': '1', 'b': '2'}], [2, {'x': '10'}], [3, {'key': 'value'}]]}} ``` ### Was this patch authored or co-authored using generative AI tooling? No notice This PR was co-authored by DeepSeek-R1. Closes #6985 from JustFeng/patch-1. Closes #6984 e0911ba94 [Reese Feng] Update PySparkTests for magic cmd bc3ce1a49 [Reese Feng] Update PySparkTests for magic cmd 200d7ad9b [Reese Feng] Fix syntax error in dict iteration in magic_table_convert_map Authored-by: Reese Feng <10377945+JustFeng@users.noreply.github.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2025-03-19 21:18:35 -07:00
dependabot[bot]	5ed33c6cb9	⬆️ Bump axios from 1.7.4 to 1.8.2 in /kyuubi-server/web-ui (#6967 )	2025-03-10 10:57:07 +00:00
dependabot[bot]	dd9cc0ed4f	[KYUUBI #6814 ] [UI] Bump cross-spawn from 7.0.3 to 7.0.6 Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from 7.0.3 to 7.0.6. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md">cross-spawn's changelog</a>.</em></p> <blockquote> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.5...v7.0.6">7.0.6</a> (2024-11-18)</h3> <h3>Bug Fixes</h3> <ul> <li>update cross-spawn version to 7.0.5 in package-lock.json (<a href="`f700743918`">f700743</a>)</li> </ul> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.4...v7.0.5">7.0.5</a> (2024-11-07)</h3> <h3>Bug Fixes</h3> <ul> <li>fix escaping bug introduced by backtracking (<a href="`640d391fde`">640d391</a>)</li> </ul> <h3><a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.4">7.0.4</a> (2024-11-07)</h3> <h3>Bug Fixes</h3> <ul> <li>disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>) (<a href="`5ff3a07d9a`">5ff3a07</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`77cd97f3ca`"><code>77cd97f</code></a> chore(release): 7.0.6</li> <li><a href="`6717de49ff`"><code>6717de4</code></a> chore: upgrade standard-version</li> <li><a href="`f700743918`"><code>f700743</code></a> fix: update cross-spawn version to 7.0.5 in package-lock.json</li> <li><a href="`9a7e3b2165`"><code>9a7e3b2</code></a> chore: fix build status badge</li> <li><a href="`085268352d`"><code>0852683</code></a> chore(release): 7.0.5</li> <li><a href="`640d391fde`"><code>640d391</code></a> fix: fix escaping bug introduced by backtracking</li> <li><a href="`bff0c87c8b`"><code>bff0c87</code></a> chore: remove codecov</li> <li><a href="`a7c6abc6fe`"><code>a7c6abc</code></a> chore: replace travis with github workflows</li> <li><a href="`9b9246e096`"><code>9b9246e</code></a> chore(release): 7.0.4</li> <li><a href="`5ff3a07d9a`"><code>5ff3a07</code></a> fix: disable regexp backtracking (<a href="https://redirect.github.com/moxystudio/node-cross-spawn/issues/160">#160</a>)</li> <li>Additional commits viewable in <a href="https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-spawn&package-manager=npm_and_yarn&previous-version=7.0.3&new-version=7.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by yaooqinn. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/kyuubi/network/alerts). </details> Closes #6814 from dependabot[bot]/dependabot/npm_and_yarn/kyuubi-server/web-ui/cross-spawn-7.0.6. Closes #6814 10dafbc6e [dependabot[bot]] ⬆️ Bump cross-spawn from 7.0.3 to 7.0.6 in /kyuubi-server/web-ui Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-01-23 20:01:00 +08:00
Cheng Pan	fff1841054	[KYUUBI #6876 ] Support rolling `spark.kubernetes.file.upload.path` ### Why are the changes needed? The vanilla Spark neither support rolling nor expiration mechanism for `spark.kubernetes.file.upload.path`, if you use file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this directory from growing indefinitely. This PR proposes to let `spark.kubernetes.file.upload.path` support placeholders `{{YEAR}}`, `{{MONTH}}` and `{{DAY}}` and introduce a switch `kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled` to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application. For example, the user can configure the below configurations in `kyuubi-defaults.conf` to enable monthly rolling support for `spark.kubernetes.file.upload.path` ``` kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled=true spark.kubernetes.file.upload.path=hdfs://hadoop-cluster/spark-upload-{{YEAR}}{{MONTH}} ``` Note that: spark would create sub dir `s"spark-upload-${UUID.randomUUID()}"` under the `spark.kubernetes.file.upload.path` for each uploading, the administer still needs to clean up the staging directory periodically. For example: ``` hdfs://hadoop-cluster/spark-upload-202412/spark-upload-f2b71340-dc1d-4940-89e2-c5fc31614eb4 hdfs://hadoop-cluster/spark-upload-202412/spark-upload-173a8653-4d3e-48c0-b8ab-b7f92ae582d6 hdfs://hadoop-cluster/spark-upload-202501/spark-upload-3b22710f-a4a0-40bb-a3a8-16e481038a63 ``` Administer can safely delete the `hdfs://hadoop-cluster/spark-upload-202412` after 20250101 ### How was this patch tested? New UTs are added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #6876 from pan3793/rolling-upload. Closes #6876 6614bf29c [Cheng Pan] comment 5d5cb3eb3 [Cheng Pan] docs 343adaefb [Cheng Pan] review 3eade8bc4 [Cheng Pan] fix 706989778 [Cheng Pan] docs 38953dc3f [Cheng Pan] Support rolling spark.kubernetes.file.upload.path Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-01-15 01:27:12 +08:00
Wang, Fei	26174278c5	[KYUUBI #6883 ] Using withOauthTokenProvider instead of withOauthToken to support token refresh ### Why are the changes needed? Address comments: https://github.com/apache/kyuubi/discussions/6877#discussioncomment-11743818 > I guess this is a Kyuubi implementation issue, we just read the content from the kyuubi.kubernetes.authenticate.oauthTokenFile and call ConfigBuilder.withOauthToken, I guess this approach does not support token refresh... ### How was this patch tested? Existing GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #6883 from turboFei/k8s_token_provider. Closes #6883 69dd28d27 [Wang, Fei] comments a01040f94 [Wang, Fei] withOauthTokenProvider Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2025-01-15 01:25:34 +08:00
liupeiyue	a051253774	[KYUUBI #6843 ] Fix 'query-timeout-thread' thread leak ### Why are the changes needed? see https://github.com/apache/kyuubi/issues/6843 If the session manager's ThreadPoolExecutor refuses to execute asyncOperation, then we need to shut down the query-timeout-thread in the catch ### How was this patch tested? 1 Use jstack to view threads on the long-lived engine side ![image](https://github.com/user-attachments/assets/95d3a897-001d-4250-bf13-172b6997021b) 2 Wait for all SQL statements in the engine to finish executing, and then use stack to check the number of query-timeout-thread threads, which should be empty. ![image](https://github.com/user-attachments/assets/0afbc026-7dd3-4594-afd2-92a5ef23f6cb) ### Was this patch authored or co-authored using generative AI tooling? NO Closes #6844 from ASiegeLion/master. Closes #6843 9107a300e [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak 4b3417f21 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak ef1f66bb5 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak 9e1a015f6 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak 78a9fde09 [liupeiyue] [KYUUBI #6843] FIX 'query-timeout-thread' thread leak Authored-by: liupeiyue <liupeiyue@yy.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2024-12-27 18:02:16 +08:00
Wang, Fei	164df8d466	[KYUUBI #6866 ][FOLLOWUP] Prevent register gauge conflicts if both thrift binary SSL and thrift http SSL enabled ### Why are the changes needed? Followup for https://github.com/apache/kyuubi/pull/6866 It would throw exception if both thrift binary SSL and thrift http SSL enabled ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes #6872 from turboFei/duplicate_gauge. Closes #6866 ea356766e [Wang, Fei] prevent conflicts 982f175fd [Wang, Fei] conflicts Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2024-12-26 18:22:48 -08:00
Wang, Fei	53034a3a14	[KYUUBI #6866 ] Add metrics for SSL keystore expiration time ### Why are the changes needed? Add metrics for SSL keystore expiration, then we can add alert if the keystore will expire in 1 month. ### How was this patch tested? Integration testing. <img width="1721" alt="image" src="https://github.com/user-attachments/assets/f4ef6af6-923b-403c-a80d-06dbb80dbe1c" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #6866 from turboFei/keystore_expire. Closes #6866 77c6db0a7 [Wang, Fei] Add metrics for SSL keystore expiration time #6866 Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2024-12-26 14:04:05 +08:00
Wang, Fei	3167692732	[KYUUBI #6829 ] Add metrics for batch pending max elapse time ### Why are the changes needed? 1. add metrics `kyuubi.operartion.batch_pending_max_elapse` for the batch pending max elapse time, which is helpful for batch health monitoring, and we can send alert if the batch pending elapse time too long 2. For `GET /api/v1/batches` api, limit the max time window for listing batches, which is helpful that, we want to reserve more metadata in kyuubi server end, for example: 90 days, but for list batches, we just want to allow user to search the last 7 days. It is optional. And if `create_time` is specified, order by `create_time` instead of `key_id`. `68a6f48da5/kyuubi-server/src/main/resources/sql/mysql/metadata-store-schema-1.8.0.mysql.sql (L32)` ### How was this patch tested? GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #6829 from turboFei/batch_pending_time. Closes #6829 ee4f93125 [Wang, Fei] docs bf8169ad4 [Wang, Fei] comments f493a2af8 [Wang, Fei] new config ab7b6db65 [Wang, Fei] ut 168017587 [Wang, Fei] in memory session 510a30b6a [Wang, Fei] batchSearchWindow opt 1e93dd276 [Wang, Fei] save Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2024-12-05 18:12:39 +08:00
Joao Amaral	27c734ed95	[KYUUBI #6722 ] Fix AppState when Engine connection is terminated # 🔍 Description ## Issue References 🔗 This issue was noticed a few times when the batch `state` was `set` to `ERROR`, but the `appState` kept the non-terminal state forever (e.g. `RUNNING`), even if the application was finished (in this case Yarn Application). ```json { "id": "******", "user": "", "batchType": "SPARK", "name": "*****", "appStartTime": 0, "appId": "****", "appUrl": "****", "appState": "RUNNING", "appDiagnostic": "", "kyuubiInstance": "*****", "state": "ERROR", "createTime": 1725343207318, "endTime": 1725343300986, "batchInfo": {} } ``` It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to `UNKNOWN` state to avoid errors. ## Describe Your Solution 🔧 This is a simple fix that only checks if the batch state is `ERROR` and the appState is not in a terminal state and changes the `appState` to `UNKNOWN`, in these cases (during the batch metadata update). ## Types of changes 🔖 - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR` state and the application keeps the last know state (e.g. RUNNING). #### Behavior With This Pull Request 🎉 If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR `state and the application has a non-terminal state, it is forced to `UNKNOWN` state. #### Related Unit Tests I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g. `YarnClient.getApplication`) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine. --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative.** Closes #6722 from joaopamaral/fix/app-state-on-batch-error. Closes #6722 8409eacac [Wang, Fei] fix da8c356a7 [Joao Amaral] format fix 73b77b3f7 [Joao Amaral] use isTerminated 64f96a256 [Joao Amaral] Remove test 1eb80ef73 [Joao Amaral] Remove test 13498fa6b [Joao Amaral] Remove test 60ce55ef3 [Joao Amaral] add todo 3a3ba162b [Joao Amaral] Fix 215ac665f [Joao Amaral] Fix AppState when Engine connection is terminated Lead-authored-by: Joao Amaral <7281460+joaopamaral@users.noreply.github.com> Co-authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Wang, Fei <fwang12@ebay.com>	2024-11-22 22:10:57 -08:00
senmiaoliu	c9d9433f74	[KYUUBI #6787 ] Improve the compatibility of queryTimeout in more version clients # 🔍 Description ## Issue References 🔗 This pull request fixes #2112 ## Describe Your Solution 🔧 Similar to #2113, the query-timeout-thread should verify the Thrift protocol version. For protocol versions <= HIVE_CLI_SERVICE_PROTOCOL_V8, it should convert TIMEDOUT_STATE to CANCELED. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6787 from lsm1/branch-timer-checker-set-cancel. Closes #6787 9fbe1ac97 [senmiaoliu] add isHive21OrLower method 0c77c6f6f [senmiaoliu] time checker set cancel state Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: senmiaoliu <senmiaoliu@trip.com>	2024-11-04 19:12:48 +08:00
Bowen Liang	d3520ddbce	[KYUUBI #6769 ] [RELEASE] Bump 1.11.0-SNAPSHOT # 🔍 Description ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 Preparing v1.11.0-SNAPSHOT after branch-1.10 cut ```shell build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="1.11.0-SNAPSHOT" (cd kyuubi-server/web-ui && npm version "1.11.0-SNAPSHOT") ``` ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6769 from bowenliang123/bump-1.11. Closes #6769 6db219d28 [Bowen Liang] get latest_branch by sorting version in branch name 465276204 [Bowen Liang] update package.json 81f2865e5 [Bowen Liang] bump Authored-by: Bowen Liang <liangbowen@gf.com.cn> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-23 17:10:56 +08:00
senmiaoliu	f876600c4a	[KYUUBI #6772 ] Fix ProcessBuilder to properly handle Java opts as a list # 🔍 Description ## Issue References 🔗 ## Describe Your Solution 🔧 This PR addresses an issue in the ProcessBuilder class where Java options passed as a single string (e.g., "-Dxxx -Dxxx") do not take effect. The command list must separate these options into individual elements to ensure they are recognized correctly by the Java runtime. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6772 from lsm1/branch-fix-processBuilder. Closes #6772 fb6d53234 [senmiaoliu] fix process builder java opts Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-23 13:28:58 +08:00
wforget	1e9d68b000	[KYUUBI #6368 ] Flink engine supports user impersonation # 🔍 Description ## Issue References 🔗 This pull request fixes #6368 ## Describe Your Solution 🔧 Support impersonation mode for flink sql engine. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 Test in hadoop-testing env. Connection: ``` beeline -u "jdbc:hive2://hadoop-master1.orb.local:10009/default;hive.server2.proxy.user=spark;principal=kyuubi/_HOSTTEST.ORG?kyuubi.engine.type=FLINK_SQL;flink.execution.target=yarn-application;kyuubi.engine.share.level=CONNECTION;kyuubi.engine.flink.doAs.enabled=true;" ``` sql: ``` select 1; ``` result: ![image](https://github.com/apache/kyuubi/assets/17894939/4bde3e4e-0dac-4e09-ac7c-a2c3a3607a13) launch engine command: ``` 2024-06-12 03:22:10.242 INFO KyuubiSessionManager-exec-pool: Thread-62 org.apache.kyuubi.engine.EngineRef: Launching engine: /opt/flink-1.18.1/bin/flink run-application \ -t yarn-application \ -Dyarn.ship-files=/opt/flink/opt/flink-sql-client-1.18.1.jar;/opt/flink/opt/flink-sql-gateway-1.18.1.jar;/etc/hive/conf/hive-site.xml \ -Dyarn.application.name=kyuubi_CONNECTION_FLINK_SQL_spark_6170b9aa-c690-4b50-938f-d59cca9aa2d6 \ -Dyarn.tags=KYUUBI,6170b9aa-c690-4b50-938f-d59cca9aa2d6 \ -Dcontainerized.master.env.FLINK_CONF_DIR=. \ -Dcontainerized.master.env.HIVE_CONF_DIR=. \ -Dyarn.security.appmaster.delegation.token.services=kyuubi \ -Dsecurity.delegation.token.provider.HiveServer2.enabled=false \ -Dsecurity.delegation.token.provider.hbase.enabled=false \ -Dexecution.target=yarn-application \ -Dsecurity.module.factory.classes=org.apache.flink.runtime.security.modules.JaasModuleFactory;org.apache.flink.runtime.security.modules.ZookeeperModuleFa ctory \ -Dsecurity.delegation.token.provider.hadoopfs.enabled=false \ -c org.apache.kyuubi.engine.flink.FlinkSQLEngine /opt/apache-kyuubi-1.10.0-SNAPSHOT-bin/externals/engines/flink/kyuubi-flink-sql-engine_2.12-1.10.0-SNAPS HOT.jar \ --conf kyuubi.session.user=spark \ --conf kyuubi.client.ipAddress=172.20.0.5 \ --conf kyuubi.engine.credentials=SERUUwACJnRocmlmdDovL2hhZG9vcC1tYXN0ZXIxLm9yYi5sb2NhbDo5MDgzRQAFc3BhcmsEaGl2ZShreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiLmxvY2Fs QFRFU1QuT1JHigGQCneevIoBkC6EIrwWDxSg03pnAB8dA295wh+Dim7Fx4FNxhVISVZFX0RFTEVHQVRJT05fVE9LRU4ADzE3Mi4yMC4wLjU6ODAyMEEABXNwYXJrAChreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiL mxvY2FsQFRFU1QuT1JHigGQCneekIoBkC6EIpBHHBSket0SQnlXT5EIMN0U2fUKFRIVvBVIREZTX0RFTEVHQVRJT05fVE9LRU4PMTcyLjIwLjAuNTo4MDIwAA== \ --conf kyuubi.engine.flink.doAs.enabled=true \ --conf kyuubi.engine.hive.extra.classpath=/opt/hadoop/share/hadoop/client/:/opt/hadoop/share/hadoop/mapreduce/ \ --conf kyuubi.engine.share.level=CONNECTION \ --conf kyuubi.engine.submit.time=1718162530017 \ --conf kyuubi.engine.type=FLINK_SQL \ --conf kyuubi.frontend.protocols=THRIFT_BINARY,REST \ --conf kyuubi.ha.addresses=hadoop-master1.orb.local:2181 \ --conf kyuubi.ha.engine.ref.id=6170b9aa-c690-4b50-938f-d59cca9aa2d6 \ --conf kyuubi.ha.namespace=/kyuubi_1.10.0-SNAPSHOT_CONNECTION_FLINK_SQL/spark/6170b9aa-c690-4b50-938f-d59cca9aa2d6 \ --conf kyuubi.server.ipAddress=172.20.0.5 \ --conf kyuubi.session.connection.url=hadoop-master1.orb.local:10009 \ --conf kyuubi.session.engine.startup.waitCompletion=false \ --conf kyuubi.session.real.user=spark ``` launch engine log: ![image](https://github.com/apache/kyuubi/assets/17894939/590463a8-2858-47a2-8897-0ddfbe3ffdf6) jobmanager job: ``` 2024-06-12 03:22:26,400 INFO org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - Loading delegation token providers 2024-06-12 03:22:26,992 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenProvider [] - Renew delegation token with engine credentials: SERUUwACJnRocmlmdDovL2hhZG9vcC1tYXN0ZXIxLm9yYi5sb2NhbDo5MDgzRQAFc3BhcmsEaGl2ZShreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiLmxvY2FsQFRFU1QuT1JHigGQCneevIoBkC6EIrwWDxSg03pnAB8dA295wh+Dim7Fx4FNxhVISVZFX0RFTEVHQVRJT05fVE9LRU4ADzE3Mi4yMC4wLjU6ODAyMEEABXNwYXJrAChreXV1YmkvaGFkb29wLW1hc3RlcjEub3JiLmxvY2FsQFRFU1QuT1JHigGQCneekIoBkC6EIpBHHBSket0SQnlXT5EIMN0U2fUKFRIVvBVIREZTX0RFTEVHQVRJT05fVE9LRU4PMTcyLjIwLjAuNTo4MDIwAA== 2024-06-12 03:22:27,100 INFO org.apache.kyuubi.engine.flink.FlinkEngineUtils [] - Add new unknown token Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 05 73 70 61 72 6b 04 68 69 76 65 28 6b 79 75 75 62 69 2f 68 61 64 6f 6f 70 2d 6d 61 73 74 65 72 31 2e 6f 72 62 2e 6c 6f 63 61 6c 40 54 45 53 54 2e 4f 52 47 8a 01 90 0a 77 9e bc 8a 01 90 2e 84 22 bc 16 0f 2024-06-12 03:22:27,104 WARN org.apache.kyuubi.engine.flink.FlinkEngineUtils [] - Ignore token with earlier issue date: Kind: HDFS_DELEGATION_TOKEN, Service: 172.20.0.5:8020, Ident: (token for spark: HDFS_DELEGATION_TOKEN owner=spark, renewer=, realUser=kyuubi/hadoop-master1.orb.localTEST.ORG, issueDate=1718162529936, maxDate=1718767329936, sequenceNumber=71, masterKeyId=28) 2024-06-12 03:22:27,104 INFO org.apache.kyuubi.engine.flink.FlinkEngineUtils [] - Update delegation tokens. The number of tokens sent by the server is 2. The actual number of updated tokens is 1. ...... 4-06-12 03:22:29,414 INFO org.apache.flink.runtime.security.token.DefaultDelegationTokenManager [] - Starting tokens update task 2024-06-12 03:22:29,415 INFO org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - New delegation tokens arrived, sending them to receivers 2024-06-12 03:22:29,422 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updating delegation tokens for current user 2024-06-12 03:22:29,422 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[10, 13, 10, 9, 8, 10, 16, -78, -36, -49, -17, -5, 49, 16, 1, 16, -100, -112, -60, -127, -8, -1, -1, -1, -1, 1] 2024-06-12 03:22:29,422 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[0, 5, 115, 112, 97, 114, 107, 4, 104, 105, 118, 101, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -68, -118, 1, -112, 46, -124, 34, -68, 22, 15] 2024-06-12 03:22:29,422 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service:172.20.0.5:8020 Identifier:[0, 5, 115, 112, 97, 114, 107, 0, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -112, -118, 1, -112, 46, -124, 34, -112, 71, 28] 2024-06-12 03:22:29,422 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updated delegation tokens for current user successfully ``` taskmanager log: ``` 2024-06-12 03:45:06,622 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Receive initial delegation tokens from resource manager 2024-06-12 03:45:06,627 INFO org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - New delegation tokens arrived, sending them to receivers 2024-06-12 03:45:06,628 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updating delegation tokens for current user 2024-06-12 03:45:06,629 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[10, 13, 10, 9, 8, 10, 16, -78, -36, -49, -17, -5, 49, 16, 1, 16, -100, -112, -60, -127, -8, -1, -1, -1, -1, 1] 2024-06-12 03:45:06,630 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service: Identifier:[0, 5, 115, 112, 97, 114, 107, 4, 104, 105, 118, 101, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -68, -118, 1, -112, 46, -124, 34, -68, 22, 15] 2024-06-12 03:45:06,630 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Token Service:172.20.0.5:8020 Identifier:[0, 5, 115, 112, 97, 114, 107, 0, 40, 107, 121, 117, 117, 98, 105, 47, 104, 97, 100, 111, 111, 112, 45, 109, 97, 115, 116, 101, 114, 49, 46, 111, 114, 98, 46, 108, 111, 99, 97, 108, 64, 84, 69, 83, 84, 46, 79, 82, 71, -118, 1, -112, 10, 119, -98, -112, -118, 1, -112, 46, -124, 34, -112, 71, 28] 2024-06-12 03:45:06,636 INFO org.apache.kyuubi.engine.flink.security.token.KyuubiDelegationTokenReceiver [] - Updated delegation tokens for current user successfully 2024-06-12 03:45:06,636 INFO org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] - Delegation tokens sent to receivers ``` #### Related Unit Tests --- # Checklist 📝 - [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6383 from wForget/KYUUBI-6368. Closes #6368 47df43ef0 [wforget] remove doAsEnabled 984b96c74 [wforget] update settings.md c7f8d474e [wforget] make generateTokenFile conf to internal 8632176b1 [wforget] address comments 2ec270e8a [wforget] licenses ed0e22f4e [wforget] separate kyuubi-flink-token-provider module b66b855b6 [wforget] address comment d4fc2bd1d [wforget] fix 1a3dc4643 [wforget] fix style 825e2a7a0 [wforget] address comments a679ba1c2 [wforget] revert remove renewer cdd499b95 [wforget] fix and comment 19caec6c0 [wforget] pass token to submit process b2991d419 [wforget] fix 7c3bdde1b [wforget] remove security.delegation.tokens.enabled check 8987c9176 [wforget] fix 5bd8cfe7c [wforget] fix 08992642d [wforget] Implement KyuubiDelegationToken Provider/Receiver fa16d7def [wforget] enable delegation token manager e50db7497 [wforget] [KYUUBI #6368] Support impersonation mode for flink sql engine Authored-by: wforget <643348094@qq.com> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-21 17:32:39 +08:00
Bowen Liang	fb65a12936	[KYUUBI #6756 ] [REST] Check max file size of uploaded resource and extra resources in batch creation # 🔍 Description ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 Check the uploaded resource files when creating batch via REST API - add config `kyuubi.batch.resource.file.max.size` for resource file's max size in bytes - add config `kyuubi.batch.extra.resource.file.max.size` for each extra resource file's max size in bytes ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6756 from bowenliang123/resource-maxsize. Closes #6756 5c409c425 [Bowen Liang] nit 4b16bcfc4 [Bowen Liang] nit 743920d25 [Bowen Liang] check resource file size max size Authored-by: Bowen Liang <liangbowen@gf.com.cn> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-21 16:04:33 +08:00
wforget	535c4f90bc	[KYUUBI #6753 ] Set hadoop fs delegation token renewer to empty # 🔍 Description ## Issue References 🔗 Allow delegation tokens to be used and renewed by yarn resourcemanager. (used in proxy user mode of flink engine, address https://github.com/apache/kyuubi/pull/6383#discussion_r1635768060) ## Describe Your Solution 🔧 Set hadoop fs delegation token renewer to empty. ## Types of changes 🔖 - [X] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [X] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6753 from wForget/renewer. Closes #6753 f2e1f0aa1 [wforget] Set hadoop fs delegation token renewer to empty Authored-by: wforget <643348094@qq.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2024-10-18 10:56:33 +08:00
Wang, Fei	cefb98d27b	[KYUUBI #6750 ] [REST] Using ForbiddenException instead of NotAllowedException # 🔍 Description ## Issue References 🔗 Seems NotAllowedException is used for method not allowed, and currently, we use false constructor, the error message we expected would not be return to client end. It only told: ``` {"message":"HTTP 405 Method Not Allowed"} ``` Because the message we used to build the NotAllowedException was treated as `allowed` method, not as `message`. ![image](https://github.com/user-attachments/assets/3199f20c-6148-4e6a-9183-7a0843913d8d) ## Describe Your Solution 🔧 We should use the ForbidenException instead, and then the error message we excepted can be visible in client end. `85dd5a52ef/kyuubi-server/src/main/scala/org/apache/kyuubi/server/api/api.scala (L47-L51)` ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests <img width="913" alt="image" src="https://github.com/user-attachments/assets/6c4e836d-a47a-485d-85a3-fd3a35a9e425"> --- # Checklist 📝 - [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6750 from turboFei/not_allowed_exception. Closes #6750 4dd6fc18c [Wang, Fei] Using ForbiddenException instead of NotAllowedException Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-18 09:20:52 +08:00
Bowen Liang	928e3d243f	[KYUUBI #6731 ] [REST] Check all required extra resource files uploaded in creating batch request # 🔍 Description ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 - check all the required extra resource files are uploaded in POST multi-part request as expected, when creating batch with REST Batch API ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6731 from bowenliang123/extra-resource-check. Closes #6731 116a47ea5 [Bowen Liang] update cd4433a8c [Bowen Liang] update 4852b1569 [Bowen Liang] update 5bb2955e8 [Bowen Liang] update 1696e7328 [Bowen Liang] update 911a9c195 [Bowen Liang] update 042e42d23 [Bowen Liang] update 56dc7fb8a [Bowen Liang] update Authored-by: Bowen Liang <liangbowen@gf.com.cn> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-18 08:44:13 +08:00
Bowen Liang	6d4c9a6d79	[KYUUBI #6744 ] Bump vite from 4.5.3 to 4.5.5 # 🔍 Description ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 - to fix CVE-2024-45812, CVE-2024-45811, CVE-2024-45812, CVE-2024-45811 and CVE-2024-45811 reported by dependent bot security alerts ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6744 from bowenliang123/vite-4.5.4. Closes #6744 271db1f5c [Bowen Liang] update Authored-by: Bowen Liang <liangbowen@gf.com.cn> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>	2024-10-17 21:31:12 +08:00
dependabot[bot]	c6212d0657	⬆️ Bump semver in /kyuubi-server/web-ui (#6736 )	2024-10-17 05:06:51 +00:00
dependabot[bot]	be66b494d8	⬆️ Bump axios from 1.6.0 to 1.7.4 in /kyuubi-server/web-ui (#6617 )	2024-10-16 06:23:34 +00:00
dependabot[bot]	d4b348e1cf	⬆️ Bump get-func-name in /kyuubi-server/web-ui (#6735 )	2024-10-16 05:29:58 +00:00
dependabot[bot]	5e49c25e7c	⬆️ Bump micromatch from 4.0.5 to 4.0.8 in /kyuubi-server/web-ui (#6729 )	2024-10-16 03:02:01 +00:00
taylor.fan	851fb5ae5c	[KYUUBI #6704 ] Disable periodic gc if set interval to 0 # 🔍 Description ## Issue References 🔗 This pull request fixes https://github.com/apache/kyuubi/issues/6704 ## Describe Your Solution 🔧 if periodic gc is set to 0, there is no need to perform an explicit gc. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [x] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6725 from taylor12805/master. Closes #6704 a52ddda62 [Bowen Liang] update doc b84a32f35 [Bowen Liang] make periodic gc thead pool lazy 2d4bd7c05 [Bowen Liang] update doc in spark style 3e04604b0 [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 bf20b134b [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 c2b7c3078 [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 6182075fc [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 52b1c078b [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 ccf19cf24 [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 affd67c88 [taylor.fan] [KYUUBI #6704] disable periodic gc if set interval to 0 d4ee164d1 [taylor.fan] disable periodic gc if set interval to 0 Lead-authored-by: taylor.fan <taylor.fan@vipshop.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Signed-off-by: Cheng Pan <chengpan@apache.org>	2024-10-16 10:58:17 +08:00
dependabot[bot]	acd80f004a	⬆️ Bump rollup from 3.29.4 to 3.29.5 in /kyuubi-server/web-ui (#6717 )	2024-10-15 07:34:57 +00:00
madlnu	2d64255874	[KYUUBI #6720 ] K8s pod OOM Killed should be identified as Application failed state # 🔍 Description ## Issue References 🔗 This pull request fixes #6720 ## Describe Your Solution 🔧 If pod goes into OOMKilled state, application should be marked as KILLED, which is eventually identified as isFailed ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 Tested locally, was able to launch new session <img width="922" alt="kyuubi_new_session" src="https://github.com/user-attachments/assets/b003c86f-484d-40c5-b173-847374a45b1d"> --- Be nice. Be informative. Closes #6721 from Madhukar525722/OOM. Closes #6720 cd0bdf633 [madlnu] [KYUUBI #6720] K8s pod OOM Killed should be identified as Application failed state Authored-by: madlnu <madlnu@visa.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2024-10-02 19:12:43 +08:00

1 2 3 4 5 ...

1181 Commits