From 66e460dfe2c9e3ff8aca489df982e611bb1c9605 Mon Sep 17 00:00:00 2001 From: sychen Date: Thu, 13 Jul 2023 13:46:31 +0800 Subject: [PATCH] [KYUUBI #5049] [DOCS] PyHive Kerberos usage doc ### _Why are the changes needed?_ image ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5049 from cxzl25/pyhive_kerberos. Closes #5049 6c8b0e62b [sychen] pyhive kerberos usage doc Authored-by: sychen Signed-off-by: Cheng Pan --- docs/client/python/pyhive.md | 40 ++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/client/python/pyhive.md b/docs/client/python/pyhive.md index 444ec1e81..b5e57ea2e 100644 --- a/docs/client/python/pyhive.md +++ b/docs/client/python/pyhive.md @@ -68,3 +68,43 @@ conn = hive.Connection(host=kyuubi_host, port=10009, username='user', password='password', auth='CUSTOM') ``` +Use Kerberos to connect to Kyuubi. + +`kerberos_service_name` must be the name of the service that started the Kyuubi server, usually the prefix of the first slash of `kyuubi.kinit.principal`. + +Note that PyHive does not support passing in `principal`, it splices in part of `principal` with `kerberos_service_name` and `kyuubi_host`. + +```python +# open connection +conn = hive.Connection(host=kyuubi_host, port=10009, auth="KERBEROS", kerberos_service_name="kyuubi") +``` + +If you encounter the following errors, you need to install related packages. + +``` +thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found' +``` + +```bash +yum install -y cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-md5 +``` + +Note that PyHive does not support the connection method based on zookeeper HA, you can connect to zookeeper to get the service address via [Kazoo](https://pypi.org/project/kazoo/). + +Code reference [https://stackoverflow.com/a/73326589](https://stackoverflow.com/a/73326589) + +```python +from pyhive import hive +import random +from kazoo.client import KazooClient +zk = KazooClient(hosts='kyuubi1.xx.com:2181,kyuubi2.xx.com:2181,kyuubi3.xx.com:2181', read_only=True) +zk.start() +servers = [kyuubi_server.split(';')[0].split('=')[1].split(':') + for kyuubi_server + in zk.get_children(path='kyuubi')] +kyuubi_host, kyuubi_port = random.choice(servers) +zk.stop() +print(kyuubi_host, kyuubi_port) +conn = hive.Connection(host=kyuubi_host, port=kyuubi_port, auth="KERBEROS", kerberos_service_name="kyuubi") +``` +