celeborn/common
DDDominik 0ed590dc81 [CELEBORN-1917] Support celeborn.client.push.maxBytesSizeInFlight
### What changes were proposed in this pull request?
add data size limitation to inflight data by introducing a new configuration: `celeborn.client.push.maxBytesInFlight.perWorker/total` and defaults to `celeborn.client.push.buffer.max.size * celeborn.client.push.maxReqsInFlight.perWorker/total`.
for backward compatibility, also add a control: `celeborn.client.push.maxReqsInFlight.enabled`.

### Why are the changes needed?
celeborn do supports limiting the number of push inflight requests via `celeborn.client.push.maxReqsInFlight.perWorker/total`. this is a good constraint to memory usage where most requests do not exceed `celeborn.client.push.buffer.max.size`. however, in a vectorized shuffle (like blaze and gluten), a request might be greatly larger then the max buffer size, leading to too much inflight data and results OOM.

### Does this PR introduce _any_ user-facing change?
Yes, add new  config for client

### How was this patch tested?
test on local env

Closes #3248 from DDDominik/CELEBORN-1917.

Lead-authored-by: DDDominik <1015545832@qq.com>
Co-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: DDDominik <zhuangxian@kuaishou.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2025-07-22 23:07:56 +08:00
..
benchmarks
src [CELEBORN-1917] Support celeborn.client.push.maxBytesSizeInFlight 2025-07-22 23:07:56 +08:00
pom.xml [CELEBORN-1530] support MPU for S3 2024-11-22 15:03:53 +08:00