celeborn/docs/configuration
Xianming Lei 0a97ca0aa9 [CELEBORN-1577][PHASE2] QuotaManager should support interrupt shuffle
### What changes were proposed in this pull request?
1. Worker reports resourceConsumption to master
2. QuotaManager calculates the resourceConsumption of each app and marks the apps that exceed the quota.
    2.1 When the tenant's resourceConsumption exceeds the tenant's quota, select the app with a larger consumption to mark interrupted.
    2.2 When the resourceConsumption of the cluster exceeds the cluster quota, select the app with larger consumption to mark interrupted.
3. Master returns to Driver through heartbeat, whether app is marked interrupted

### Why are the changes needed?
The current storage quota logic can only limit new shuffles, and cannot limit the writing of existing shuffles. In our production environment, there is such an scenario: the cluster is small, but the user's app single shuffle is large which occupied disk resources, we want to interrupt those shuffle.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
UTs.

Closes #2819 from leixm/CELEBORN-1577-2.

Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2025-03-24 22:05:45 +08:00
..
client.md [CELEBORN-1894] Allow skipping already read chunks during unreplicated shuffle read retried 2025-03-18 11:37:33 +08:00
columnar-shuffle.md
ha.md [CELEBORN-1400] Bump Ratis version from 2.5.1 to 3.0.1 2024-05-30 17:22:22 +08:00
index.md [MINOR] Add documentation for CELEBORN_NO_DAEMONIZE 2024-12-23 10:31:37 +08:00
master.md [CELEBORN-1577][PHASE2] QuotaManager should support interrupt shuffle 2025-03-24 22:05:45 +08:00
metrics.md [CELEBORN-1745] Remove application top disk usage code 2024-11-28 10:55:34 +08:00
network-module.md [CELEBORN-1353] Document Celeborn security - authentication and SSL support 2024-04-30 14:37:56 +08:00
network.md [MINOR] Change config versions 2025-03-11 07:39:32 +08:00
quota.md [CELEBORN-1577][PHASE2] QuotaManager should support interrupt shuffle 2025-03-24 22:05:45 +08:00
worker.md [CELEBORN-1792][FOLLOWUP] Keep resume for a while after resumeByPinnedMemory 2025-03-05 09:37:59 +08:00