kyuubi/docs
zhouyifan279 a196ace284
[KYUUBI #6199] Support to run HiveSQLEngine on kerberized YARN
# 🔍 Description
## Issue References 🔗

This pull request implement a feature -  Run HiveSQLEngine on kerberized YARN

## Describe Your Solution 🔧
Introduced two configs:
- kyuubi.engine.principal
- kyuubi.engine.keytab

When do submit to a kerberized YARN, submitter uploads `kyuubi.engine.keytab` to application's staging dir.
YARN NodeManager downloads keytab to AM's working directory. AM logins to Kerberos using the principal and keytab

**Note**
I've tried to run HiveSQLEngine with only DelegationTokens but failed.

Take SQL `SELECT * FROM a` as an example:
Hive handles this simple TableScan SQL by reading directly from table's hdfs file.
When Hive invokes `FileInputFormat.getSplits` during reading, `java.io.IOException: Delegation Token can be issued only with kerberos or web authentication` will be thrown.
The simplified stacktrace from IDEA is as below:
```
getDelegationToken:734, DFSClient (org.apache.hadoop.hdfs)
getDelegationToken:2072, DistributedFileSystem (org.apache.hadoop.hdfs)
collectDelegationTokens:108, DelegationTokenIssuer (org.apache.hadoop.security.token)
addDelegationTokens:83, DelegationTokenIssuer (org.apache.hadoop.security.token)
obtainTokensForNamenodesInternal:143, TokenCache (org.apache.hadoop.mapreduce.security)
obtainTokensForNamenodesInternal:102, TokenCache (org.apache.hadoop.mapreduce.security)
obtainTokensForNamenodes:81, TokenCache (org.apache.hadoop.mapreduce.security)
listStatus:221, FileInputFormat (org.apache.hadoop.mapred)
getSplits:332, FileInputFormat (org.apache.hadoop.mapred)
getNextSplits:372, FetchOperator (org.apache.hadoop.hive.ql.exec)
getRecordReader:304, FetchOperator (org.apache.hadoop.hive.ql.exec)
getNextRow:459, FetchOperator (org.apache.hadoop.hive.ql.exec)
pushRow:428, FetchOperator (org.apache.hadoop.hive.ql.exec)
fetch:147, FetchTask (org.apache.hadoop.hive.ql.exec)
getResults:2208, Driver (org.apache.hadoop.hive.ql)
getNextRowSet:494, SQLOperation (org.apache.hive.service.cli.operation)
getNextRowSetInternal:105, HiveOperation (org.apache.kyuubi.engine.hive.operation)
```

Theoretically, it can be solved by add AM DelegationTokens into
 `org.apache.hadoop.hive.ql.exec.FetchOperator.job.credentials`.
But actually, it is impossible without modifying Hive's source code.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️
HiveSQLEngine can not run on a kerberized YARN

#### Behavior With This Pull Request 🎉
HiveSQLEngine can run on a kerberized YARN

#### Related Unit Tests

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6199 from zhouyifan279/kerberized-hive-engine-on-yarn.

Closes #6199

383d1cdcb [zhouyifan279] Fix tests
458493a91 [zhouyifan279] Warn if run Hive on YARN without principal and keytab
118afe280 [zhouyifan279] Warn if run Hive on YARN without principal and keytab
41fed0c44 [zhouyifan279] Ignore Principal&Keytab when hadoop security is no enabled.
9e2d86237 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/hive/HiveProcessBuilder.scala
5ae0a3eac [zhouyifan279] Remove redundant checks
5d3013aaf [zhouyifan279] Use principal & keytab in Local mode
5733dfdcb [zhouyifan279] Use principal & keytab in Local mode
85ce9bb7a [zhouyifan279] Use principal & keytab in Local mode
061223dbe [zhouyifan279] Resolve comments
e706936e7 [zhouyifan279] Resolve comments
f84c7bccc [zhouyifan279] Support run HiveSQLEngine on kerberized YARN
4d262c847 [zhouyifan279] Support run HiveSQLEngine on kerberized YARN

Lead-authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-03-22 16:18:43 +08:00
..
_static/css Revert "[KYUUBI #5908] [DOCS] Remove workaround for malformed table" 2023-12-24 01:53:05 +08:00
appendix [KYUUBI #4655] [DOCS] Enrich docs for Kyuubi Hive JDBC driver 2023-04-03 18:51:27 +08:00
client [KYUUBI #6000] Modify the incorrect configuration file in the trino-cli documentation 2024-01-19 19:27:05 +08:00
configuration [KYUUBI #6199] Support to run HiveSQLEngine on kerberized YARN 2024-03-22 16:18:43 +08:00
connector [KYUUBI #6119] [DOC] Update doc for HA/Zookeeper Configuration 2024-03-04 05:38:52 +00:00
contributing [KYUUBI #6163] Set default Spark version to 3.5 2024-03-12 16:22:37 +08:00
deployment [KYUUBI #6142] Deprecate Flink 1.16 2024-03-08 10:21:51 +08:00
extensions [KYUUBI #6163] Set default Spark version to 3.5 2024-03-12 16:22:37 +08:00
imgs [KYUUBI #5914] Update layer diagram on welcome page 2023-12-25 16:13:48 +08:00
monitor [KYUUBI #6091] Deprecate and remove building support for Spark 3.1 2024-03-04 20:23:06 +08:00
overview [KYUUBI #4624] [Docs] Fix table headers in kyuubi_vs_hive.md 2023-03-28 16:40:35 +08:00
quick_start [KYUUBI #6191] Update docs to mention support of Flink 1.19 2024-03-18 21:56:32 +08:00
security [KYUUBI #5427] [AUTHZ] Shade spark authz plugin 2023-10-20 20:10:34 +08:00
tools [KYUUBI #6119] [DOC] Update doc for HA/Zookeeper Configuration 2024-03-04 05:38:52 +00:00
conf.py Revert "[KYUUBI #5908] [DOCS] Remove workaround for malformed table" 2023-12-24 01:53:05 +08:00
index.rst [KYUUBI #6068] Remove community section from user docs 2024-02-21 05:20:42 +00:00
make.bat [KYUUBI #4235] [DOCS] Prefer https:// URLs in docs 2023-02-03 14:01:11 +08:00
Makefile
requirements.txt [KYUUBI #5902] Bump doc build dependencies 2023-12-21 18:37:43 -08:00