### What changes were proposed in this pull request? Add `ConcurrentHashMap<Integer, Exception> registerShuffleExceptions` in ShuffleClientImpl to record register shuffle failed reason. ### Why are the changes needed? There could be various reasons for a Register Shuffle failure, such as SLOT_NOT_AVAILABLE, RESERVE_SLOTS_FAILED, and so on. However, the current exceptions only indicate that a shuffle registration has failed without providing details on the cause of the failure. We are unable to determine the exact reason for the failure unless we check the LifecycleManager logs. In this PR, the actual reason for a register shuffle failure is included in the thrown exception. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? unit test: org.apache.celeborn.client.ShuffleClientSuiteJ#testRegisterShuffleFailed Closes #2590 from jiang13021/register_shuffle_failed_reason. Authored-by: jiang13021 <jiangyanze.jyz@antgroup.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com> |
||
|---|---|---|
| .. | ||
| src | ||
| pom.xml | ||