### _Why are the changes needed?_ In minor cases, Spark Stage hangs forever when spark.sql.finalWriteStage.eagerlyKillExecutors.enabled is true. The bug occurs if two conditions are met in the same time: 1. All executors are either removed because of idle time out or killed by FinalStageResourceManager. Target executor num in YarnAllocator will be set to 0 and no more executor will be launched. 2. Target executor num in ExecutorAllocationManager equals to the executor num needed by final stage. Then ExecutorAllocationManager will not sync target executor num to YarnAllocator. ### _How was this patch tested?_ - [x] Add a new test suite `FinalStageResourceManagerSuite` Closes #5141 from zhouyifan279/adjust-executors. Closes #5136 c4403eefa [zhouyifan279] assert adjustedTargetExecutors == 1 ea8f24733 [zhouyifan279] Add comment 5f3ca1d9c [zhouyifan279] [KYUUBI #5136][Bug] Spark App may hang forever if FinalStageResourceManager killed all executors 12687eee7 [zhouyifan279] [KYUUBI #5136][Bug] Spark App may hang forever if FinalStageResourceManager killed all executors 9dcbc780d [zhouyifan279] [KYUUBI #5136][Bug] Spark App may hang forever if FinalStageResourceManager killed all executors Authored-by: zhouyifan279 <zhouyifan279@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> |
||
|---|---|---|
| .. | ||
| actions | ||
| ISSUE_TEMPLATE | ||
| workflows | ||
| labeler.yml | ||
| PULL_REQUEST_TEMPLATE | ||