<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. -->  ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #1527 from yaooqinn/hadoc. Closes #1527 249701b6 [Kent Yao] [DOC] Improve High Availability Guide 03244332 [Kent Yao] [DOC] Improve High Availability Guide 54c1cd9d [Kent Yao] [DOC] Improve High Availability Guide 6d8a0991 [Kent Yao] [DOC] Improve High Availability Guide 5b4dff3d [Kent Yao] [DOC] Improve High Availability Guide Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
106 KiB
Kyuubi High Availability Guide
As an enterprise-class ad-hoc SQL query service built on top of Apache Spark, Kyuubi takes high availability (HA) as a major characteristic, aiming to ensure an agreed level of service availability, such as a higher than normal period of uptime.
Running Kyuubi in HA mode is to use groups of computers or containers that support SQL query service on Kyuubi that can be reliably utilized with a minimum amount of down-time. Kyuubi operates by using Apache ZooKeeper to harness redundant service instances in groups that provide continuous service when one or more components fail.
Without HA, if a server crashes, Kyuubi will be unavailable until the crashed server is fixed. With HA, this situation will be remedied by hardware/software faults auto-detecting, and immediately another Kyuubi service instance will be ready to serve without requiring human intervention.
HA Architecture
Currently, Kyuubi supports load balancing to make the whole system highly available.
Load balancing aims to optimize all Kyuubi service unit's usage, maximize throughput, minimize response time, and avoid overload of a single unit. Using multiple Kyuubi service units with load balancing instead of a single unit may increase reliability and availability through redundancy.
Key Benefits
- High concurrency
- By adding or removing Kyuubi server instances can easily scale up or down to meet the need of client requests.
- Upgrade smoothly
- Kyuubi server supports stop gracefully. We could delete a
k.i.but not stop it immediately. In this case, thek.i.will not take any new connection request but only operation requests from existing connections. After all connection are released, it stops then. - The dependencies of Kyuubi engines are free to change, such as bump up versions, modify configurations, add external jars, relocate to another engine home. Everything will be reloaded during start and stop.
- Kyuubi server supports stop gracefully. We could delete a
System-side Deployment
When applying HA to Kyuubi deployment, we need to be aware of the below two thing basically,
kyuubi.ha.zookeeper.quorum- the external zookeeper cluster address for deploy ak.i.kyuubi.ha.zookeeper.namespace- the root directory, a.k.a. the ServerSpace for deploy ak.i.
For more configurations, please see the HA section of Introduction to the Kyuubi Configurations System
Pseudo mode
When kyuubi.ha.zookeeper.quorum is not configured, a k.i. will start an embedded zookeeper service and expose the address of itself there.
In this pseduo mode, the k.i. can be connected by clients through both raw ip address and zk quorum + namespace.
But it doesn't have any availability to being highly available.
Production mode
For production deployment purpose, an external zookeeper cluster is required for kyuubi.ha.zookeeper.quorum.
In this mode, multiple k.i.s can be registered to the same ServerSpace configured by kyuubi.ha.zookeeper.namespace and serve together.
Client-side Usage
With Kyuubi Hive JDBC Driver or vanilla Hive JDBC Driver, a client can specify service discovery mode in JDBC connection string, i.e. serviceDiscoveryMode=zooKeeper; and set zooKeeperNamespace=kyuubi;, then it can randomly pick one of the Kyuubi service uris from the specified ZooKeeper addresses in the /kyuubi path.
For example,
bin/beeline -u 'jdbc:hive2://10.242.189.214:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n kentyao
