celeborn/client
zky.zhoukeyong 2bfbab7a47 [CELEBORN-1320] Use ReviveManager for soft splits
### What changes were proposed in this pull request?
Currently SOFT_SPLIT bypasses `ReviveManager` and sends `PartitionSplit` requests to
LifecycleManager individually, which can cause too many messages in `inbox`, see the issued
described in https://github.com/apache/incubator-celeborn/pull/2366

This PR uses `ReviveManager`, i.e. batch RPCs for `SOFT_SPLIT` events. Before this PR,
the max size of `Inbox#messages` is several hundreds in my experiment where frequent soft splits happen:
```
24/03/11 18:33:05 WARN [rpc-server-4-7] Inbox: last max msg cnt in 1 second: 620
24/03/11 18:33:06 WARN [rpc-server-4-5] Inbox: last max msg cnt in 1 second: 105
24/03/11 18:33:07 WARN [rpc-server-4-14] Inbox: last max msg cnt in 1 second: 94
24/03/11 18:33:08 WARN [rpc-server-4-13] Inbox: last max msg cnt in 1 second: 726
24/03/11 18:33:09 WARN [rpc-server-4-3] Inbox: last max msg cnt in 1 second: 50]
24/03/11 18:33:10 WARN [rpc-server-4-16] Inbox: last max msg cnt in 1 second: 98
24/03/11 18:33:11 WARN [rpc-server-4-12] Inbox: last max msg cnt in 1 second: 83
24/03/11 18:33:12 WARN [rpc-server-4-11] Inbox: last max msg cnt in 1 second: 138
24/03/11 18:33:13 WARN [rpc-server-4-9] Inbox: last max msg cnt in 1 second: 315
24/03/11 18:33:14 WARN [rpc-server-4-4] Inbox: last max msg cnt in 1 second: 787
```

After this PR, the size is reduced by one magnitude:
```
24/03/11 18:39:17 WARN [rpc-server-4-5] Inbox: last max msg cnt in 1 second: 30]
24/03/11 18:39:18 WARN [rpc-server-4-12] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:19 WARN [rpc-server-4-19] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:20 WARN [rpc-server-4-15] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:21 WARN [rpc-server-4-3] Inbox: last max msg cnt in 1 second: 10]
24/03/11 18:39:22 WARN [rpc-server-4-20] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:23 WARN [rpc-server-4-12] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:24 WARN [rpc-server-4-24] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:25 WARN [rpc-server-4-9] Inbox: last max msg cnt in 1 second: 10]
24/03/11 18:39:26 WARN [rpc-server-4-13] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:27 WARN [rpc-server-4-2] Inbox: last max msg cnt in 1 second: 10]
24/03/11 18:39:28 WARN [rpc-server-4-2] Inbox: last max msg cnt in 1 second: 80]
```

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA and manual test.

Closes #2377 from waitinfuture/1320.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-12 11:50:38 +08:00
..
src [CELEBORN-1320] Use ReviveManager for soft splits 2024-03-12 11:50:38 +08:00
pom.xml [CELEBORN-713] Local network binding support IP or FQDN 2023-06-27 09:42:11 +08:00