kyuubi/docs
Kent Yao fd17dd0ae4
[KYUUBI #1300] Detecting critical errors
<!--
Thanks for sending a pull request!

Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
  2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->

### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
  1. If you add a feature, you can talk about the use case of it.
  2. If you fix a bug, you can clarify why it is a bug.
-->

For critical errors at engine side, it is not handled properly. For example,when engine oom
- server may not be able to get operation statuses because of no response of engine side. In this case, client only get a ambiguous `read timeout` as a final cause.
- the oom hook of engine side might directly crash the engine, when the server is still trying to get operation status

In this PR,
- a config for retry to make the operation status updating process more robust
- make the engine oom hook only de register itself to make it able to recover for some transient errors

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #1312 from yaooqinn/1300.

Closes #1300

a715ecca [Kent Yao] add comments
a816b0f0 [Kent Yao] refine
3557c927 [Kent Yao] add comments
2ed8bfb4 [Kent Yao] refine
aefd2a7f [Kent Yao] update doc
f2ea7e4c [Kent Yao] restore SparkOperation
386e4eac [Kent Yao] [KYUUBI #1300] Detecting critical errors

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2021-11-02 11:46:51 +08:00
..
appendix [KYUUBI #951] [LICENSE] Add license header on all docs 2021-08-19 09:53:52 +08:00
client [KYUUBI #951] [LICENSE] Add license header on all docs 2021-08-19 09:53:52 +08:00
community [KYUUBI #1292] Refine release.md and announce.tmpl 2021-10-26 16:10:06 +08:00
deployment [KYUUBI #1300] Detecting critical errors 2021-11-02 11:46:51 +08:00
develop_tools [KYUUBI #951] [LICENSE] Add license header on all docs 2021-08-19 09:53:52 +08:00
imgs [KYUUBI #1305] [DOC] Fix document errors in quick start 2021-10-28 17:41:05 +08:00
integrations [KYUUBI #951] [LICENSE] Add license header on all docs 2021-08-19 09:53:52 +08:00
monitor [KYUUBI #1302] [DOC] Monitoring Kyuubi - Logging System 2021-10-27 19:38:13 +08:00
overview [KYUUBI #460] Upgrade Hive dependency to 2.3.9 2021-10-23 13:47:22 +08:00
quick_start [KYUUBI #1305] [DOC] Fix document errors in quick start 2021-10-28 17:41:05 +08:00
security [KYUUBI #1090] Add deployment document about Hadoop Credentials Manager 2021-09-15 10:02:09 +08:00
sql [KYUUBI #660] Add UDF session_user 2021-10-26 17:12:50 +08:00
tools [KYUUBI #951] [LICENSE] Add license header on all docs 2021-08-19 09:53:52 +08:00
conf.py [KYUUBI #874] [ASF] ASF Publish 2021-08-16 11:48:21 +08:00
index.rst [KYUUBI #951] [LICENSE] Add license header on all docs 2021-08-19 09:53:52 +08:00
make.bat [KYUUBI #874] [ASF] ASF Publish 2021-08-16 11:48:21 +08:00
Makefile [KYUUBI #874] [ASF] ASF Publish 2021-08-16 11:48:21 +08:00
requirements.txt [KYUUBI #874] [ASF] ASF Publish 2021-08-16 11:48:21 +08:00