### What changes were proposed in this pull request? Improve both combine and sort operation of shuffle read for `CelebornShuffleReader` to reduce the number of spills to disk. ### Why are the changes needed? After the shuffle reader obtains the block, it will first perform a combine operation, and then perform a sort operation. It is known that both combine and sort may generate temporary files, so the performance may be poor when both sort and combine are used. In fact, combine operations can be performed during the sort process, and we can avoid the combine spill file. Backport: [[SPARK-46512][CORE] Optimize shuffle reading when both sort and combine are used](https://github.com/apache/spark/pull/44512) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA and cluster. Closes #2326 from SteNicholas/CELEBORN-1287. Authored-by: SteNicholas <programgeek@163.com> Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com> |
||
|---|---|---|
| .. | ||
| common | ||
| spark-2 | ||
| spark-2-shaded | ||
| spark-3 | ||
| spark-3-columnar-shuffle | ||
| spark-3-shaded | ||