kyuubi/docs/deployment
Cheng Pan fff1841054
[KYUUBI #6876] Support rolling spark.kubernetes.file.upload.path
### Why are the changes needed?

The vanilla Spark neither support rolling nor expiration mechanism for `spark.kubernetes.file.upload.path`, if you use file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this directory from growing indefinitely.

This PR proposes to let `spark.kubernetes.file.upload.path` support placeholders `{{YEAR}}`, `{{MONTH}}` and `{{DAY}}` and introduce a switch `kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled` to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application.

For example, the user can configure the below configurations in `kyuubi-defaults.conf` to enable monthly rolling support for `spark.kubernetes.file.upload.path`
```
kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled=true
spark.kubernetes.file.upload.path=hdfs://hadoop-cluster/spark-upload-{{YEAR}}{{MONTH}}
```

Note that: spark would create sub dir `s"spark-upload-${UUID.randomUUID()}"` under the `spark.kubernetes.file.upload.path` for each uploading, the administer still needs to clean up the staging directory periodically.

For example:
```
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-f2b71340-dc1d-4940-89e2-c5fc31614eb4
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-173a8653-4d3e-48c0-b8ab-b7f92ae582d6
hdfs://hadoop-cluster/spark-upload-202501/spark-upload-3b22710f-a4a0-40bb-a3a8-16e481038a63
```

Administer can safely delete the `hdfs://hadoop-cluster/spark-upload-202412` after 20250101

### How was this patch tested?

New UTs are added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6876 from pan3793/rolling-upload.

Closes #6876

6614bf29c [Cheng Pan] comment
5d5cb3eb3 [Cheng Pan] docs
343adaefb [Cheng Pan] review
3eade8bc4 [Cheng Pan] fix
706989778 [Cheng Pan] docs
38953dc3f [Cheng Pan] Support rolling spark.kubernetes.file.upload.path

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-01-15 01:27:12 +08:00
..
spark [KYUUBI #6545] Deprecate and remove building support for Spark 3.2 2024-07-22 11:59:34 +08:00
engine_lifecycle.md
engine_on_kubernetes.md [KYUUBI #6876] Support rolling spark.kubernetes.file.upload.path 2025-01-15 01:27:12 +08:00
engine_on_yarn.md [KYUUBI #6239] Rename beeline to kyuubi-beeline 2024-04-03 18:35:38 +08:00
engine_share_level.md [KYUUBI #6671] [DOC] Fix typo in ENGINE SHARE LEVEL docs 2024-09-05 14:34:45 +08:00
high_availability_guide.md [KYUUBI #6719] [DOC] Fix a couple of typos 2024-09-29 15:15:53 +08:00
hive_metastore.md [KYUUBI #6239] Rename beeline to kyuubi-beeline 2024-04-03 18:35:38 +08:00
index.rst [KYUUBI #5154] [Doc] Move configuration docs to the top level 2023-08-11 18:23:08 +08:00
kyuubi_on_kubernetes.md [KYUUBI #6842] Bump Spark 3.5.4 2024-12-23 11:21:45 +08:00
migration-guide.md [KYUUBI #6545] Deprecate and remove building support for Spark 3.2 2024-07-22 11:59:34 +08:00