celeborn/worker/src
codenohup f09482108f [CELEBORN-1867][FLINK] Fix flink client memory leak of TransportResponseHandler#outstandingRpcs for handling addCredit and notifyRequiredSegment response
### What changes were proposed in this pull request?

When I tested the performance of Flink with Celeborn in session mode, using both the shuffle-service plugin and hybrid shuffle integration strategies, I noticed that the task heap continuously grew even when no jobs were running.

The issue arises because the Celeborn client sends addCredit or notifyRequiredSegment requests, expecting a response. This creates a callback and maintains a reference to CelebornBufferStream, SingleInputGate, and StreamTask.
These callbacks are stored in TransportResponseHandler#outstandingRpcs and are cleared upon receiving a response.

However, the Worker does not send a response for these two RPC requests, which leads to a significant memory leak in the client.

### Why are the changes needed?
It may cause flink client memory leak.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Test by rerun TPCDS in Flink session mode.

Closes #3103 from codenohup/celeborn-1867.

Authored-by: codenohup <huangxu.walker@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2025-02-21 13:50:54 +08:00
..
main [CELEBORN-1867][FLINK] Fix flink client memory leak of TransportResponseHandler#outstandingRpcs for handling addCredit and notifyRequiredSegment response 2025-02-21 13:50:54 +08:00
test [CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files 2025-02-19 16:57:44 +08:00