### Why are the changes needed?
Previously, the `HiveScan` class was used to read data. If it is determined to be PARQUET type, the `ParquetScan` from Spark datasourcev2 can be used. `ParquetScan` supports pushfilter down, but `HiveScan` does not yet support it.
The conversation can be controlled by setting `spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet`. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe.
close https://github.com/apache/kyuubi/issues/7129
### How was this patch tested?
added unit test
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#7130 from flaming-archer/master_parquet_filterdown.
Closes#7129
d7059dca4 [tian bao] Support PARQUET hive table pushdown filter
Authored-by: tian bao <2011xuesong@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### Why are the changes needed?
Previously, the `HiveScan` class was used to read data. If it is determined to be ORC type, the `ORCScan` from Spark datasourcev2 can be used. `ORCScan` supports pushfilter down, but `HiveScan` does not yet support it.
In our testing, we are able to achieve approximately 2x performance improvement.
The conversation can be controlled by setting `spark.sql.kyuubi.hive.connector.read.convertMetastoreOrc`. When enabled, the data source ORC reader is used to process ORC tables created by using the HiveQL syntax, instead of Hive SerDe.
close https://github.com/apache/kyuubi/issues/7122
### How was this patch tested?
added unit test
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#7123 from flaming-archer/master_scanbuilder_new.
Closes#7122
c3f412f90 [tian bao] add case _
2be48909f [tian bao] Merge branch 'master_scanbuilder_new' of github.com:flaming-archer/kyuubi into master_scanbuilder_new
c825d0f8c [tian bao] review change
8a26d6a8a [tian bao] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/KyuubiHiveConnectorConf.scala
68d41969f [tian bao] review change
bed007fea [tian bao] review change
b89e6e67a [tian bao] Optimize UT
5a8941b2d [tian bao] fix failed ut
dc1ba47e3 [tian bao] orc pushdown version 0
Authored-by: tian bao <2011xuesong@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### Why are the changes needed?
There were some breaking changes after we fixed compatibility for Spark 4.0.0 RC1 in #6920, but now Spark has reached 4.0.0 RC6, which has less chance to receive more breaking changes.
### How was this patch tested?
Changes are extracted from https://github.com/apache/kyuubi/pull/6928, which passed CI with Spark 4.0.0 RC6
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#7061 from pan3793/6920-followup.
Closes#6920
17a1bd9e5 [Cheng Pan] [KYUUBI #6920][FOLLOWUP] Spark SQL engine supports Spark 4.0
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### Why are the changes needed?
Test Spark 3.5.5 Release Notes
https://spark.apache.org/releases/spark-release-3-5-5.html
### How was this patch tested?
Pass GHA.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#6939 from pan3793/spark-3.5.5.
Closes#6939
8c0288ae5 [Cheng Pan] ga
78b0e72db [Cheng Pan] nit
686a7b0a9 [Cheng Pan] fix
d40cc5bba [Cheng Pan] Bump Spark 3.5.5
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
## Issue References 🔗
This pull request fixes #
## Describe Your Solution 🔧
Preparing v1.11.0-SNAPSHOT after branch-1.10 cut
```shell
build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="1.11.0-SNAPSHOT"
(cd kyuubi-server/web-ui && npm version "1.11.0-SNAPSHOT")
```
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
#### Behavior Without This Pull Request ⚰️
#### Behavior With This Pull Request 🎉
#### Related Unit Tests
---
# Checklist 📝
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6769 from bowenliang123/bump-1.11.
Closes#6769
6db219d28 [Bowen Liang] get latest_branch by sorting version in branch name
465276204 [Bowen Liang] update package.json
81f2865e5 [Bowen Liang] bump
Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
# 🔍 Description
Spark 4.0.0-preview2 RC1 passed the vote
https://lists.apache.org/thread/4ctj2mlgs4q2yb4hdw2jy4z34p5yw2b1
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Pass GHA.
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6699 from pan3793/spark-4.0.0-preview2.
Closes#6699
2db1f645d [Cheng Pan] 4.0.0-preview2
42055bb1e [Cheng Pan] fix
d29c0ef83 [Cheng Pan] disable delta test
98d323b95 [Cheng Pan] fix
2e782c00b [Cheng Pan] log4j-slf4j2-impl
fde4bb6ba [Cheng Pan] spark-4.0.0-preview2
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
This PR makes KSHC support Spark 4.0, and also makes sure that the KSHC jar compiled against Spark 3.5 is binary compatible with Spark 4.0.
We are ready to enable CI for Spark 4.0, except for authZ module.
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Pass GHA.
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6453 from pan3793/spark4-ci.
Closes#6453
695e3d7f7 [Cheng Pan] Update pom.xml
2eaa0f88a [Cheng Pan] Update .github/workflows/master.yml
b1f540a34 [Cheng Pan] cross test
562839982 [Cheng Pan] fix
9f0c2e1be [Cheng Pan] fix
45f182462 [Cheng Pan] kshc
227ef5bae [Cheng Pan] fix
690a3b8b2 [Cheng Pan] Revert "fix"
87fe7678b [Cheng Pan] fix
60f55dbed [Cheng Pan] CI for Spark 4.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
The `kyuubi-util-scala_2.12-<version>-tests.jar` accidentally leaked to the compile scope but should be in the test scope.
## Types of changes 🔖
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Run `build/dist` and check `dist/jars`
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6439 from pan3793/util-scala-test.
Closes#6439
0576248f5 [Cheng Pan] fix
2bf2408f5 [Cheng Pan] fix
f7151dfc6 [Cheng Pan] kyuubi-util-scala test jar leaked to compile scope
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
## Issue References 🔗
This pull request closes#6247
This also closes#6431
## Describe Your Solution 🔧
Add a job `spark-connector-cross-version-test` in GitHub Actions to:
1. Build KSHC package with maven opt `-Pspark-3.5`
2. Run KSHC tests with maven opt `-Pspark-3.3` and `-Pspark-3.4` and KSHC package built in step 1
3. Fix the binary-compatible issue via reflection.
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Pass GHA.
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6436 from zhouyifan279/kshc-cross-version-test.
Closes#6247
d3ac2ef47 [zhouyifan279] Tune the KSHC code to fix binary-compatible issues
4e14edcb5 [zhouyifan279] Fix invalid unit-tests-log name
56ca45d18 [zhouyifan279] Fix invalid unit-tests-log name
4c5ab7b9e [zhouyifan279] Update test log name
8a84e8812 [zhouyifan279] Add matrix scala
17cb67155 [zhouyifan279] [KYUUBI #6247] KSHC cross-version test
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
This PR makes `javax.servlet` and `jakarta.servlet` co-exist, by introducing `javax.servlet-api-4.0.1` and upgrade `jakarta.servlet-api` to 5.0.0. (6.0.0 requires JDK 11)
Spark 4.0 migrated from `javax.servlet` to `jakarta.servlet` in SPARK-47118 while Kyuubi still uses `javax.servlet` in other modules, we should allow them to co-exist for a while.
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Pass GHA.
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6392 from pan3793/servlet.
Closes#6392
27d412599 [Cheng Pan] fix
9f1e72272 [Cheng Pan] other spark modules
f4545dc76 [Cheng Pan] fix
313826fa7 [Cheng Pan] exclude
7d5028154 [Cheng Pan] Support javax.servlet and jakarta.servlet co-exist
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
## Issue References 🔗
This pull request fixes https://github.com/apache/kyuubi/issues/6078, KSHC should handle the commit of the partitioned table as dynamic partition at write path, that's beacuse the process of writing with Apache Spark DataSourceV2 using dynamic partitioning to handle static partitions.
## Describe Your Solution 🔧
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
## Types of changes 🔖
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
#### Behavior Without This Pull Request ⚰️
#### Behavior With This Pull Request 🎉
#### Related Unit Tests
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6082 from Yikf/KYUUBI-6078.
Closes#6078
2ae183672 [yikaifei] KSHC should handle the commit of the partitioned table as dynamic partition at write path
Authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
## Issue References 🔗
SPARK-33212 (fixed in 3.2.0) moves from `hadoop-client` to shaded hadoop client, to simplify the dependency management, previously , we add some workaround to handle Spark 3.1 dependency issues. As we removed building support for Spark 3.1 now, we can remove those workaround to simplify `pom.xml`
## Describe Your Solution 🔧
As above.
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Pass GA.
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6131 from pan3793/3-1-cleanup.
Closes#6131
1341065a7 [Cheng Pan] nit
1d7323f6e [Cheng Pan] fix
9e2e3b747 [Cheng Pan] nit
271166b58 [Cheng Pan] test
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
# 🔍 Description
## Issue References 🔗
This pull request is the next step of deprecating and removing support of Spark 3.1
VOTE: https://lists.apache.org/thread/670fx1qx7rm0vpvk8k8094q2d0fthw5b
VOTE RESULT: https://lists.apache.org/thread/0zdxg5zjnc1wpxmw9mgtsxp1ywqt6qvb
## Describe Your Solution 🔧
Drop module `kyuubi-extension-spark-3-1` and delete Spark 3.1 specific codes.
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Pass GA.
---
# Checklist 📝
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes#6125 from pan3793/drop-spark-ext-3-1.
Closes#6125
212012f18 [Cheng Pan] fix style
021532ccd [Cheng Pan] doc
329f69ab9 [Cheng Pan] address comments
43fac4201 [Cheng Pan] fix
a12c8062c [Cheng Pan] fix
dcf51c1a1 [Cheng Pan] minor
814a187a6 [Cheng Pan] Drop Kyuubi extension for Spark 3.1
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
This pr aims to fix https://github.com/apache/kyuubi/issues/5414.
`HiveReader` initialization incorrectly uses the global hadoopConf as hiveconf, which causes reader to pollut the global hadoopConf and cause job read failure.
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No
Closes#5424 from Yikf/orc-read.
Closes#5414
d6bdf7be4 [yikaifei] [KYUUBI #5414] Reader should not polluted the global hiveconf instance
Authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Remove the deprecated usage.
c780db754e/src/java.base/share/classes/java/lang/Class.java (L534-L535)
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No.
Closes#5426 from cxzl25/newInstance.
Closes#5426
dcb679b95 [sychen] avoid use class.newInstance directly
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
The Apache Spark Community found a performance regression with log4j2. See https://github.com/apache/spark/pull/36747.
This PR to fix the performance issue on our side.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No.
Closes#5400 from ITzhangqiang/KYUUBI_5365.
Closes#5365
dbb9d8b32 [ITzhangqiang] [KYUUBI #5365] Don't use Log4j2's extended throwable conversion pattern in default logging configurations
Authored-by: ITzhangqiang <itzhangqiang@163.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
close https://github.com/apache/kyuubi/issues/5317#issue-1904751001
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No
Closes#5319 from zhaomin1423/fixhive-connector.
Closes#5317
02e5321dc [Cheng Pan] nit
cadabf4ab [Cheng Pan] nit
d38832f40 [zhaomin] improve
ee5b62d84 [zhaomin] improve
794473468 [zhaomin] improve
e3eca91fb [zhaomin] add tests
d9302e2ba [zhaomin] [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table
0bc8ec16f [zhaomin] [KYUUBI #5317] [Bug] Hive Connector throws NotSerializableException on reading Hive Avro partitioned table
Lead-authored-by: zhaomin <zhaomin1423@163.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
This PR aims to unify the exception handling of v1 and v2 during dropDatabase
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No
Closes#5225 from Yikf/hive-connector.
Closes#5225
3be33af76 [yikaifei] [KSHC] Improve test
Authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
### _Why are the changes needed?_
- to make Spark SQL hive connector plugin compilable on Scala 2.13 with Spark 3.3/3.4
- rename class name `FilePartitionReader` which is copied from Spark to `SparkFilePartitionReader`to fix the class mismatch error
```
[ERROR] [Error] /Users/bw/dev/incubator-kyuubi/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HivePartitionReaderFactory.scala:83: type mismatch;
found : Iterator[org.apache.kyuubi.spark.connector.hive.read.HivePartitionedFileReader[org.apache.spark.sql.catalyst.InternalRow]]
required: Iterator[org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader[org.apache.spark.sql.catalyst.InternalRow]]
```
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No.
Closes#5193 from bowenliang123/scala213-hivecon.
Closes#5193
d8c6bf5f0 [liangbowen] defer toMap
b20ad4eb1 [liangbowen] adapt spark hive connector plugin to Scala 2.13
Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: yikaifei <yikaifei@apache.org>
### _Why are the changes needed?_
- Change hardcoded Scala's version 2.12 in Maven module's `artifactId` to placeholder `scala.binary.version` which is defined in project parent pom as 2.12
- Preparation for Scala 2.13/3.x support in the future
- No impact on using or building Maven modules
- Some ignorable warning messages for unstable artifactId will be thrown by Maven.
```
Warning: Some problems were encountered while building the effective model for org.apache.kyuubi:kyuubi-server_2.12🫙1.8.0-SNAPSHOT
Warning: 'artifactId' contains an expression but should be a constant
```
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
### _Was this patch authored or co-authored using generative AI tooling?_
No.
Closes#5175 from bowenliang123/artifactId-scala.
Closes#5177
2eba29cfa [liangbowen] use placeholder of scala binary version for artifactId
Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
- Support initializing or comparing version with major version only, e.g "3" equivalent to "3.0"
- Remove redundant version comparison methods by using semantic versions of Spark, Flink and Kyuubi
- adding common `toDouble` method
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#5039 from bowenliang123/improve-semanticversion.
Closes#5039
b6868264f [liangbowen] nit
d39646b7d [liangbowen] SPARK_ENGINE_RUNTIME_VERSION
9148caad0 [liangbowen] use semantic versions
ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes
Lead-authored-by: liangbowen <liangbowen@gf.com.cn>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_
This PR aims to fix a bug, In KSHC, `catalog.createTable` should use the correct provider.
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#5022 from Yikf/KSHC-createTable.
Closes#5022
cd8cb1cf2 [yikaifei] CreateTable should use the correct provider
Authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: yikaifei <yikaifei@apache.org>
### _Why are the changes needed?_
This PR amins to support Parquet/Orc provider is splitable.
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#5017 from Yikf/KSHC-support-split.
Closes#5017
9dc3d3d56 [yikaifei] Support Parquet/Orc provider is splitable
Authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: yikaifei <yikaifei@apache.org>
### _Why are the changes needed?_
As title, In KSHC, HiveTable's identify does not attach the catalog to prevent an incorrect catalogName. default catalog is "spark_catalog"
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#5023 from Yikf/tableName2.
Closes#5023
86b6a58d0 [yikaifei] KSHC v1IdentifierNoCatalog in spark3.4
Authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
### _Why are the changes needed?_
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#5028 from zhaomin1423/fix_hive_connector.
Closes#5028
d9c7e9c8a [zhaomin] Update session hadoop conf to catalog hadoop conf
Authored-by: zhaomin <zhaomin1423@163.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
### _Why are the changes needed?_
This pr amins to make KSHC support Apache Spark 3.4.
- KSHC support Apache Spark 3.4
- Make Apache kyuubi `codecov` module contain the spark-3.4 profile. so that Apache kyubbi CI can cover some modules.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#4999 from Yikf/kudu-spark3.4.
Closes#4999
6a35e54b8 [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
66bb742eb [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
7be517c7f [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
ae23133d1 [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
dda5e6521 [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
e43a25dff [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
54f52f16d [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveConnectorUtils.scala
0955b544b [Cheng Pan] Update pom.xml
38a1383d9 [yikaifei] codecov module should contain the spark 3.4 profile
Lead-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: yikaifei <yikaifei@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
There are hdfs-site.xml, hive-site, etc in spark job classpath, but we should use hadoop conf and hive conf from catalog options.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes#4995 from zhaomin1423/fix_hive_connector.
Closes#4995
64429fdcb [Xiao Zhao] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveTableCatalog.scala
d921be750 [zhaomin] fix
375934d65 [zhaomin] Using hadoop conf and hive conf from catalog options
Lead-authored-by: zhaomin <zhaomin1423@163.com>
Co-authored-by: Xiao Zhao <zhaomin1423@163.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
- To improve Scala code with corrections, simplification, scala style, redundancy cleaning-up. No feature changes introduced.
Corrections:
- Class doesn't correspond to file name (SparkListenerExtensionTest)
- Correct package name in ResultSetUtil and PySparkTests
Improvements:
- 'var' could be a 'val'
- GetOrElse(null) to orNull
Cleanup & Simplification:
- Redundant cast inspection
- Redundant collection conversion
- Simplify boolean expression
- Redundant new on case class
- Redundant return
- Unnecessary parentheses
- Unnecessary partial function
- Simplifiable empty check
- Anonymous function convertible to a method value
Scala Style:
- Constructing range for seq indices
- Get and getOrElse to getOrElse
- Convert expression to Single Abstract Method (SAM)
- Scala unnecessary semicolon inspection
- Map and getOrElse(false) to exists
- Map and flatten to flatMap
- Null initializer can be replaced by _
- scaladoc link to method
Other Improvements:
- Replace map and getOrElse(true) with forall
- Unit return type in the argument of map
- Size to length on arrays and strings
- Type check can be pattern matching
- Java mutator method accessed as parameterless
- Procedure syntax in method definition
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4959 from bowenliang123/scala-Improve.
Closes#4959
2d36ff351 [liangbowen] code improvement for Scala
Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_
Close#4870
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4872 from pan3793/util.
Closes#4870
0b9fe3cba [Cheng Pan] nit
ecc5ee4f2 [Cheng Pan] fix
63be7a20c [Cheng Pan] test
85363c187 [Cheng Pan] style
2227247dd [Cheng Pan] fix package
11d10a081 [Cheng Pan] Add kyuubi-util and kyuubi-util-scala modules
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Remove all transitive dependencies to make the down stream project easy to consume.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4620 from pan3793/kshc.
Closes#4620
407f669f5 [Cheng Pan] [KSHC] Cut off transitive dependenices
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
This PR aims to make Kyuubi Spark Hive Connector(KSHC) support kerberized HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a `HadoopDelegationTokenProvider`.
To enable access to an kerberized HMS using KSHC, the minimal configurations are
```
spark.sql.catalog.warm=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.warm.hive.metastore.uris=<thrift-uris>
```
then it's able to run federation query across metastores
```
SELECT * FROM spark_catalog.db1.tbl1 JOIN warm.db2.tbl2 ON ...
```
In addition, it allows disabling token renewal for each catalog explicitly
```
spark.sql.catalog.warm.delegation.token.renewal.enabled=false
```
The current implementation has some limitations:
the catalog configuration must be present on the Spark application bootstrap, which means the catalog configurations should be set in `spark-defaults.conf` or append as `--conf` like:
```
spark-[sql|shell|submit] \
--conf spark.sql.catalog.xxx=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
--conf spark.sql.catalog.xxx.hive.abc=xyz
```
but does not work for dynamic registering through SET statement, e.g. `SET spark.sql.catalog.xxx=`
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [x] Add screenshots for manual tests if appropriate
```
> (select count(*) from hive_2.mammut.test_7) union ( select count(*) from spark_catalog.test.test01 limit 1);
+-----------+
| count(1) |
+-----------+
| 4 |
| 1 |
+-----------+
2 rows selected (8.378 seconds)
```
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4560 from pan3793/shc-token.
Closes#4560
fe8cd0c6d [Cheng Pan] Centralized metastore token signature fallback logic
851159559 [Cheng Pan] comments
fc3b4d596 [Cheng Pan] hive.metastore.token.signature fallback to hive.metastore.uris
fb7eb033f [Cheng Pan] unused import
858b39024 [Cheng Pan] New catalog property delegation.token.renewal.enabled
28ec5a543 [Cheng Pan] disable hms client retry
52044d474 [Cheng Pan] update comments
33b241831 [Cheng Pan] [KSHC] Support Kerberos by implementing KyuubiHiveConnectorDelegationTokenProvider
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
This PR aims to close https://github.com/apache/kyuubi/issues/4525.
The root cause of this problem is that Apache Spark does predicate push-down in `V2ScanRelationPushDown`, but the spark-hive-connector does not apply push-down predicates for data filtering.
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4528 from Yikf/KYUUBI-4525.
Closes#4525
a65a1873f [Yikf] Partitioning predicates should take effect to filter data
Authored-by: Yikf <yikaifei@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Respect Java/Scala coding conventions in KSHC (Kyuubi Spark Hive Connector).
For singleton(`object` in Scala) invoking, use `AbcUtils.method(...)` instead of `abcUtils.method(...)`
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4488 from pan3793/shc-rename.
Closes#4488
ec9a80198 [Cheng Pan] nit
84d3bb413 [Cheng Pan] Keep object orignal name defined in HiveBridgeHelper
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
jobId across tasks should be consistent to meet the contract expected by Hadoop committers
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4432 from Yikf/jobid.
Closes#4432
4e7401c91 [Yikf] jobId across tasks should be consistent
Authored-by: Yikf <yikaifei@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
This pr aims to improve code for hive-connector FileWriterFactory, the main goal is to reduce duplicate copies of spark code.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4391 from Yikf/improve-code.
Closes#4391
7991f145 [Yikf] improve code for hive-connector FileWriterFactory
Authored-by: Yikf <yikaifei@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
[SPARK-41448](https://issues.apache.org/jira/browse/SPARK-41448) make consistent MR job IDs in FileBatchWriter and FileFormatWriter in Apache Spark 3.3.2, but it breaks a serializable issue, JobId is non-serializable.
And this pr aims to rewrite `FileWriterFactory` to circumvent the problem
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4359 from Yikf/FileWriterFactory.
Closes#4359
dd8c90fe [Cheng Pan] Update extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/FileWriterFactory.scala
1e5164ec [Yikf] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
Lead-authored-by: Yikf <yikaifei@apache.org>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
fix#4222
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [x] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#4237 from jiaoqingbo/kyuubi4222.
Closes#4237
7677d69f [jiaoqingbo] code review
538d436a [jiaoqingbo] [Kyuubi #4222] Use hiveTableCatalog to updateTableStats instead of sessionCatalog
Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Avoid too much logs on console in CI.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3864 from pan3793/nit.
Closes#3864
39687176 [Cheng Pan] Add missing log4j2-test.xml for Kyuubi Spark Hive Connector
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Introduce code style check support for Maven's pom.xml with sortPom in spotless maven plugin.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3843 from bowenliang123/spotless-pom.
Closes#3842
3c654597 [liangbowen] apply to pom.xml
fd1536f7 [liangbowen] set expandEmptyElements to true
e498423f [liangbowen] apply spotless:apply to all pom.xml
e46bcfec [liangbowen] add pom style check support in spotless
Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Fix https://github.com/apache/incubator-kyuubi/issues/3529
The intent of this PR is the following:
- Add tests related to catalog, including the listTables, loadTable, and listNamespaces methods;
- Initialize the DDL test framework.
- Add CreateNamespaceSuite, DropNamespaceSuite and ShowTablesSuite to check for consistency with V1 in hive connector.
- Rectify the fault that namespaces are deleted in cascades. During cascades, ignore the exception that the table exists in the namespace.
- Fix the tableName problem of HiveTable, which should contain namespace name.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3530 from Yikf/hivev2-test.
Closes#3529
d0af0760 [yikf] Add tests to check for consistency with V1
Authored-by: yikf <yikaifei1@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Fix https://github.com/apache/incubator-kyuubi/issues/3464, currently, Kyuubi supports hive connector for read/write hive table, it is implemented based on the [Apache Spark DataSource V2](https://www.databricks.com/session/apache-spark-data-source-v2), but there's a potential issue;
Kyuubi use `kyuubi.engine.single.spark.session`=[false](https://kyuubi.apache.org/docs/latest/deployment/settings.html#:~:text=kyuubi.engine.single.spark.session) to provide concurrency sql execution in context isolation, this cause spark.newSession invoked for each transaction, in spark v1 catalog, externalCatalog is shared in the mutiple session, but in catalog v2 architecture, it's big different with v1, v2 catalogs are managed by `CatalogManager` which is [session level](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala#L97), this means that each session will have a separate catalogManager, so in the case of multiple sessions, hivecatalog will be initialized multiple times. this causes two problems: 1 multiple sessions may be wasted initializing multiple HiveExternalCatalog, which may cause the JVM namespace to swell. 2 multiple HiveClient connections may be initialized;
This issue aims to pool externalCatalog to address the potential issues mentioned above.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3465 from Yikf/catalog-pool.
Closes#3464
5e8a94dd [yikf] Support for pooling external catalog
Authored-by: yikf <yikaifei1@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
### _Why are the changes needed?_
Fix https://github.com/apache/incubator-kyuubi/issues/3437
This pr aims to refactory class location of the hive connector
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3438 from Yikf/hive-connector-rename.
Closes#3437
a41dd15b [yikf] Refactory class location of the hive connector
Authored-by: yikf <yikaifei1@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Fix https://github.com/apache/incubator-kyuubi/issues/3366
This PR is a subtask of Kyuubi's support for Hive data sources, and aims to support write code path
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3367 from Yikf/support-hive-write.
Closes#3366
85197a65 [yikf] Support hive write code path
Authored-by: yikf <yikaifei1@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
### _Why are the changes needed?_
Fix https://github.com/apache/incubator-kyuubi/issues/3378
This pr aims to improve hive-connector module tests and make CI perform tests for the hive connector
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3379 from Yikf/hive-connector-test.
Closes#3378
72ad050c [yikf] CI test for hive-connector
Authored-by: yikf <yikaifei1@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
In a modern database architecture, users may have a strong need for federated queries. Since there are a large number of Hive warehouse in the history database, we tried to implement the Hive V2 Datasource based on Spark Datasource V2 to meet this need. for the discussion, see :https://lists.apache.org/thread/fq8ywr58rzf9bycflj1q4fl1xyz2rq2w
This PR is the first step in fixing https://github.com/apache/incubator-kyuubi/issues/3259, having
- initialization implementation
- support read code path
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#3260 from yikf/hive-v2-connector.
Closes#3259
753aca30 [yikf] Initial implementation of the Hive Connector based on the Spark datasource V2
Authored-by: yikf <yikaifei1@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>