kyuubi/integration-tests
Cheng Pan 208354c327
[KYUUBI #6028] Exited spark-submit process should not block batch submit queue
# 🔍 Description
## Issue References 🔗

While enabling batch implementation V2 with the following configurations
```
kyuubi.batch.impl.version=2
kyuubi.batch.submitter.enabled=true
kyuubi.batch.submitter.threads=48
spark.master=yarn
spark.submit.deployMode=cluster
spark.yarn.submit.waitAppCompletion=false
```

I found that the batch jobs will be blocked in the DB queue once a YARN queue has no resources, this brings an issue, the subsequential batch jobs that are going to be submitted to another YARN queue also be queued in DB, rather than YARN queue.

```
mysql> select state, engine_state, count(1) from metadata where state in ('INITIALIZED', 'PENDING', 'RUNNING') group by state, engine_state;
+-------------+--------------+----------+
| state       | engine_state | count(1) |
+-------------+--------------+----------+
| INITIALIZED | NULL         |      166 |
| PENDING     | NULL         |        1 |
| RUNNING     | PENDING      |      148 |
| RUNNING     | RUNNING      |      415 |
+-------------+--------------+----------+
```

## Describe Your Solution 🔧

The submitter queue whose size is controlled by `kyuubi.batch.submitter.threads` is designed to address the `spark-submit` process concurrency issue, too many `spark-submit` processes may run out of the Kyuubi server's node CPU/memory resources and eventually crash the service. For Spark YARN cluster mode, if set `spark.yarn.submit.waitAppCompletion=false`, the local `spark-submit` process exits immediately once the Application goes ACCEPTED status, even no resource could be allocated for the AM container, we should not block such case in submitter queue.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Pass GA, and roll out into internal cluster.

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6028 from pan3793/batch-submit.

Closes #6028

05fcc758f [Cheng Pan] Exited spark-submit process should not block batch submit queue

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-01-30 23:38:52 +08:00
..
kyuubi-flink-it [KYUUBI #5783] Switch to kyuubi-relocated-hive-service-rpc 2023-12-07 19:55:10 +08:00
kyuubi-gluten-it [KYUUBI #5939] Bump Gluten version to recover Gluten IT 2024-01-05 10:58:11 +08:00
kyuubi-hive-it [KYUUBI #5867] HiveEngine support run on YARN mode 2023-12-29 18:50:12 +08:00
kyuubi-jdbc-it [KYUUBI #5862] Use TestContainerForAll for testing JDBC engine with testcontainers 2023-12-18 21:20:41 +08:00
kyuubi-kubernetes-it [KYUUBI #6028] Exited spark-submit process should not block batch submit queue 2024-01-30 23:38:52 +08:00
kyuubi-trino-it [KYUUBI #5978] Canonicalize Trino IT in GitHub Action workflow 2024-01-16 12:19:31 +08:00
kyuubi-zookeeper-it [KYUUBI #5365] Don't use Log4j2's extended throwable conversion pattern in default logging configurations 2023-10-11 21:41:22 +08:00
pom.xml [KYUUBI #5800] [KYUUBI#5467] Integrate Intel Gluten with Spark engine 2023-12-07 10:47:00 +08:00