celeborn/common
mingji a4687716d2 [CELEBORN-791] Remove slots allocation simulation in master and use active slots sent from worker's heartbeat
### What changes were proposed in this pull request?
Master won't simulate slots allocations and use active slots sent from worker.

### Why are the changes needed?
I have observed that a new worker might allocate more slots than other workers when using the round-robin slot allocation algorithm.
There is a logic error in processing heartbeat from worker. It will update disk info's active slots to max(current disk info active slots, disk info sent from worker active slots). If I registered a huge shuffle, master will allocate more slots than a disk's max slots and mark them as unknown disk slots but worker will count the unknown disk slots as active slots and report it to the master. Then the slots release logic can not distinguish unknown slots from a number so the release will not decrease active slots properly.
Due to the gap between work and master, so I think it's OK to remove slots allocation simulation from worker and use active slots from worker.

Before this patch:
<img width="928" alt="截屏2023-07-12 16 51 15" src="https://github.com/apache/incubator-celeborn/assets/4150993/9c8a46d9-26a8-42f5-a956-938273277c9b">

After this patch:
<img width="509" alt="截屏2023-07-12 16 25 52" src="https://github.com/apache/incubator-celeborn/assets/4150993/c49b3d91-14ea-4eb8-9b71-9aab73541faf">

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
UT and cluster.

Closes #1710 from FMX/CELEBORN-791.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-14 20:40:55 +08:00
..
benchmarks [CELEBORN-744] Add Benchmark framework and ComputeIfAbsentBenchmark 2023-06-29 20:19:30 +08:00
src [CELEBORN-791] Remove slots allocation simulation in master and use active slots sent from worker's heartbeat 2023-07-14 20:40:55 +08:00
pom.xml [CELEBORN-666] Define protobuf-maven-plugin in the root pom.xml 2023-06-12 19:46:46 +08:00