celeborn/client-spark
SteNicholas ac5fad53b6 [CELEBORN-1287] Improve both combine and sort operation of shuffle read for CelebornShuffleReader
### What changes were proposed in this pull request?

Improve both combine and sort operation of shuffle read for `CelebornShuffleReader` to reduce the number of spills to disk.

### Why are the changes needed?

After the shuffle reader obtains the block, it will first perform a combine operation, and then perform a sort operation. It is known that both combine and sort may generate temporary files, so the performance may be poor when both sort and combine are used. In fact, combine operations can be performed during the sort process, and we can avoid the combine spill file.
Backport: [[SPARK-46512][CORE] Optimize shuffle reading when both sort and combine are used](https://github.com/apache/spark/pull/44512)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA and cluster.

Closes #2326 from SteNicholas/CELEBORN-1287.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
2024-02-25 21:59:35 +08:00
..
common [CELEBORN-1242] Unify celeborn thread name format 2024-01-23 16:56:40 +08:00
spark-2 [CELEBORN-1287] Improve both combine and sort operation of shuffle read for CelebornShuffleReader 2024-02-25 21:59:35 +08:00
spark-2-shaded [CELEBORN-1250][FOLLOWUP] Fix license issues 2024-01-30 16:45:21 +08:00
spark-3 [CELEBORN-1287] Improve both combine and sort operation of shuffle read for CelebornShuffleReader 2024-02-25 21:59:35 +08:00
spark-3-columnar-shuffle [CELEBORN-1150] Revert "[] support io encryption for spark" 2024-01-04 13:00:58 +08:00
spark-3-shaded [CELEBORN-1250][FOLLOWUP] Fix license issues 2024-01-30 16:45:21 +08:00