### What changes were proposed in this pull request? When I tested the performance of Flink with Celeborn in session mode, using both the shuffle-service plugin and hybrid shuffle integration strategies, I noticed that the task heap continuously grew even when no jobs were running. The issue arises because the Celeborn client sends addCredit or notifyRequiredSegment requests, expecting a response. This creates a callback and maintains a reference to CelebornBufferStream, SingleInputGate, and StreamTask. These callbacks are stored in TransportResponseHandler#outstandingRpcs and are cleared upon receiving a response. However, the Worker does not send a response for these two RPC requests, which leads to a significant memory leak in the client. ### Why are the changes needed? It may cause flink client memory leak. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Test by rerun TPCDS in Flink session mode. Closes #3103 from codenohup/celeborn-1867. Authored-by: codenohup <huangxu.walker@gmail.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com> |
||
|---|---|---|
| .. | ||
| src | ||
| pom.xml | ||