celeborn/server-master/src
Keyong Zhou ebe8793ff7
[ISSUE-450] Fix performance regression (#452)
When compare TPC-DS 3T between main branch and branch-0.1, we found that round-robin is slower than branch-0.1, which is unexpected because the high-level allocation algorithm are basically the same.
```
main roundrobin:   5332s
branch-0.1:        5027s
```
After digging deeper I found that's because branch-0.1 first allocates master locations round-robin, then slave locations round-robin, however in main branch it allocates (master, slave) pairs round-robin. As a result, say one worker has two disks disk1 and disk2, we find all master partitions are allocated on disk1 and all slave partitions are allocated on disk2, which is different from branch-0.1 which disk1 and disk2 have the same number of both master partitions and slave partitions.
Experiments show that when change main branch algorithm vivic to branch-0.1, we get the performance back.
Time of q74:
```
branch-0.1:          58.749s
main before fix:     70.114s
main after fix:      58.987s
```
2022-08-24 19:32:42 +08:00
..
main [ISSUE-450] Fix performance regression (#452) 2022-08-24 19:32:42 +08:00
test [ISSUE-418][BUG] Start master/worker should respect rpc port setting (#419) 2022-08-22 17:18:23 +08:00