### _Why are the changes needed?_ To close #2102 Support to retry all the internal thrift request calls(except RenewDelegationToken now), and fast fail if the remote engine is not stable or not alive. In this PR, it supports engine liveness probe. If it is enabled, a companion thrift client will be created and open a liveness probe session when opening remote engine session. It will send some simple thrift request(GetInfo) to check whether the remote engine is alive, and fast fail before retry if remote engine is not connectable. #### Why not use the same thrift client to check engine liveness before retry? I tried that, but met `out of resp sequence` error. For example: 1. send getOperationStatus request 2. read time out 3. send GetInfoType request 4. receive getOperationStatus response (out of resp sequence) ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #2122 from turboFei/retry_rpc. Closes #2102 3926ba04 [Fei Wang] adress comments ade4ede6 [Fei Wang] add timeout 1b7a64f9 [Fei Wang] Only check remote engine alive before retry 98e03f8e [Fei Wang] refactor fac388cf [Fei Wang] remove unused import 9c6d8737 [Fei Wang] add ut 9b595650 [Fei Wang] Support to retry the thrift request and engine alive probe Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com> |
||
|---|---|---|
| .. | ||
| spark | ||
| engine_lifecycle.md | ||
| engine_on_kubernetes.md | ||
| engine_on_yarn.md | ||
| engine_share_level.md | ||
| high_availability_guide.md | ||
| hive_metastore.md | ||
| index.rst | ||
| kyuubi_on_kubernetes.md | ||
| settings.md | ||