celeborn/client
jiang13021 bc18c9ae39 [CELEBORN-1479] Report register shuffle failed reason in exception
### What changes were proposed in this pull request?
Add `ConcurrentHashMap<Integer, Exception> registerShuffleExceptions` in ShuffleClientImpl to record register shuffle failed reason.

### Why are the changes needed?
There could be various reasons for a Register Shuffle failure, such as SLOT_NOT_AVAILABLE, RESERVE_SLOTS_FAILED, and so on. However, the current exceptions only indicate that a shuffle registration has failed without providing details on the cause of the failure. We are unable to determine the exact reason for the failure unless we check the LifecycleManager logs.
In this PR, the actual reason for a register shuffle failure is included in the thrown exception.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
unit test: org.apache.celeborn.client.ShuffleClientSuiteJ#testRegisterShuffleFailed

Closes #2590 from jiang13021/register_shuffle_failed_reason.

Authored-by: jiang13021 <jiangyanze.jyz@antgroup.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-09 17:12:23 +08:00
..
src [CELEBORN-1479] Report register shuffle failed reason in exception 2024-07-09 17:12:23 +08:00
pom.xml [MINOR] Unifiy license format of pom.xml 2024-03-21 14:34:49 +08:00