### What changes were proposed in this pull request? Flink supports fallback to vanilla Flink built-in shuffle implementation. ### Why are the changes needed? When quota is unenough or workers are unavailable, `RemoteShuffleMaster` does not support fallback to `NettyShuffleMaster`, and `RemoteShuffleEnvironment` does not support fallback to `NettyShuffleEnvironment` at present. Flink should support fallback to vanilla Flink built-in shuffle implementation for unenough quota and unavailable workers.  ### Does this PR introduce _any_ user-facing change? - Introduce `ShuffleFallbackPolicy` interface to determine whether fallback to vanilla Flink built-in shuffle implementation. ``` /** * The shuffle fallback policy determines whether fallback to vanilla Flink built-in shuffle * implementation. */ public interface ShuffleFallbackPolicy { /** * Returns whether fallback to vanilla flink built-in shuffle implementation. * * param shuffleContext The job shuffle context of Flink. * param celebornConf The configuration of Celeborn. * param lifecycleManager The {link LifecycleManager} of Celeborn. * return Whether fallback to vanilla flink built-in shuffle implementation. */ boolean needFallback( JobShuffleContext shuffleContext, CelebornConf celebornConf, LifecycleManager lifecycleManager); } ``` - Introduce `celeborn.client.flink.shuffle.fallback.policy` config to support shuffle fallback policy configuration. ### How was this patch tested? - `RemoteShuffleMasterSuiteJ#testRegisterJobWithForceFallbackPolicy` - `WordCountTestBase#celeborn flink integration test with fallback - word count` Closes #2932 from SteNicholas/CELEBORN-1700. Authored-by: SteNicholas <programgeek@163.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com> |
||
|---|---|---|
| .. | ||
| common | ||
| spark-2 | ||
| spark-2-shaded | ||
| spark-3 | ||
| spark-3-columnar-common | ||
| spark-3-columnar-shuffle | ||
| spark-3-shaded | ||
| spark-3.5-columnar-shuffle | ||