kyuubi/docs
Cheng Pan fff1841054
[KYUUBI #6876] Support rolling spark.kubernetes.file.upload.path
### Why are the changes needed?

The vanilla Spark neither support rolling nor expiration mechanism for `spark.kubernetes.file.upload.path`, if you use file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this directory from growing indefinitely.

This PR proposes to let `spark.kubernetes.file.upload.path` support placeholders `{{YEAR}}`, `{{MONTH}}` and `{{DAY}}` and introduce a switch `kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled` to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application.

For example, the user can configure the below configurations in `kyuubi-defaults.conf` to enable monthly rolling support for `spark.kubernetes.file.upload.path`
```
kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled=true
spark.kubernetes.file.upload.path=hdfs://hadoop-cluster/spark-upload-{{YEAR}}{{MONTH}}
```

Note that: spark would create sub dir `s"spark-upload-${UUID.randomUUID()}"` under the `spark.kubernetes.file.upload.path` for each uploading, the administer still needs to clean up the staging directory periodically.

For example:
```
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-f2b71340-dc1d-4940-89e2-c5fc31614eb4
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-173a8653-4d3e-48c0-b8ab-b7f92ae582d6
hdfs://hadoop-cluster/spark-upload-202501/spark-upload-3b22710f-a4a0-40bb-a3a8-16e481038a63
```

Administer can safely delete the `hdfs://hadoop-cluster/spark-upload-202412` after 20250101

### How was this patch tested?

New UTs are added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6876 from pan3793/rolling-upload.

Closes #6876

6614bf29c [Cheng Pan] comment
5d5cb3eb3 [Cheng Pan] docs
343adaefb [Cheng Pan] review
3eade8bc4 [Cheng Pan] fix
706989778 [Cheng Pan] docs
38953dc3f [Cheng Pan] Support rolling spark.kubernetes.file.upload.path

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-01-15 01:27:12 +08:00
..
_static/css Revert "[KYUUBI #5908] [DOCS] Remove workaround for malformed table" 2023-12-24 01:53:05 +08:00
appendix [KYUUBI #4655] [DOCS] Enrich docs for Kyuubi Hive JDBC driver 2023-04-03 18:51:27 +08:00
client [KYUUBI #6734] [DOC] add authentication example in REST API docs 2024-10-16 13:12:00 +08:00
configuration [KYUUBI #6876] Support rolling spark.kubernetes.file.upload.path 2025-01-15 01:27:12 +08:00
connector [KYUUBI #6804] Bump Iceberg from 1.6.1 to 1.7.0 2024-11-14 18:25:09 +08:00
contributing [KYUUBI #6545] Deprecate and remove building support for Spark 3.2 2024-07-22 11:59:34 +08:00
deployment [KYUUBI #6876] Support rolling spark.kubernetes.file.upload.path 2025-01-15 01:27:12 +08:00
extensions [KYUUBI #6842] Bump Spark 3.5.4 2024-12-23 11:21:45 +08:00
imgs [KYUUBI #5914] Update layer diagram on welcome page 2023-12-25 16:13:48 +08:00
monitor [KYUUBI #6861] Configuration guide of structured logging for Kyuubi server 2024-12-25 17:22:53 +08:00
overview [KYUUBI #6596] Fix typos in architecture page 2024-08-08 12:12:30 +00:00
quick_start [KYUUBI #6557] Support Flink 1.20 2024-08-05 22:57:39 +08:00
security [KYUUBI #6728] [DOC] update Authz plugin docs of build command with -am option 2024-10-16 13:31:14 +08:00
tools [KYUUBI #6242] Remove block cleaner docs 2024-04-03 13:39:34 +08:00
conf.py Revert "[KYUUBI #5908] [DOCS] Remove workaround for malformed table" 2023-12-24 01:53:05 +08:00
index.rst [KYUUBI #6068] Remove community section from user docs 2024-02-21 05:20:42 +00:00
make.bat
Makefile
requirements.txt [KYUUBI #6752] [DOC] Bump doc build requirements 2024-10-18 10:39:02 +08:00