### What changes were proposed in this pull request? This PR fixes performance degradation when Spark's coalescePartitions takes effect caused by RPC latency. ### Why are the changes needed? I encountered a performance degradation when testing tpcds 10T q10: ||Time| |---|---| |ESS|14s| |Celeborn| 24s| After digging into it I found out that q10 triggers partition coalescence:  As I configured `spark.sql.adaptive.coalescePartitions.initialPartitionNum` to 1000, `CelebornShuffleReader` will call `shuffleClient.readPartition` sequentially 1000 times, causing the delay. This PR optimizes by calling `shuffleClient.readPartition` in parallel. After this PR q10 time becomes 14s. ### Does this PR introduce _any_ user-facing change? No, but introduced a new client side configuration `celeborn.client.streamCreatorPool.threads` which defaults to 32. ### How was this patch tested? TPCDS 1T and passes GA. Closes #1876 from waitinfuture/943. Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Co-authored-by: Keyong Zhou <waitinfuture@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> |
||
|---|---|---|
| .. | ||
| client.md | ||
| columnar-shuffle.md | ||
| ha.md | ||
| index.md | ||
| master.md | ||
| metrics.md | ||
| network.md | ||
| quota.md | ||
| worker.md | ||