Commit Graph

9 Commits

Author SHA1 Message Date
Frank Bertsch
b49ed02f16
[KYUUBI #7106] Make response.results.columns optional
### Why are the changes needed?
Bugfix. Spark 3.5 is returning `None` for `response.results.columns`, while Spark 3.3 returned actual values.

The response here: https://github.com/apache/kyuubi/blob/master/python/pyhive/hive.py#L507

For a query that does nothing (mine was an `add jar s3://a/b/c.jar`), here are the responses I received.

Spark 3.3:
```
TFetchResultsResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), hasMoreRows=False, results=TRowSet(startRowOffset=0, rows=[], columns=[TColumn(boolVal=None, byteVal=None, i16Val=None, i32Val=None, i64Val=None, doubleVal=None, stringVal=TStringColumn(values=[], nulls=b'\x00'), binaryVal=None)], binaryColumns=None, columnCount=None))
```

Spark 3.5:
```
TFetchResultsResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), hasMoreRows=False, results=TRowSet(startRowOffset=0, rows=[], columns=None, binaryColumns=None, columnCount=None))
```

### How was this patch tested?
I tested by applying it locally and running my query against Spark 3.5. I was not able to get any unit tests running, sorry!

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #7107 from fbertsch/spark_3_5_fix.

Closes #7106

13d1440a8 [Frank Bertsch] Make response.results.columns optional

Authored-by: Frank Bertsch <fbertsch@netflix.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-06-23 23:28:18 +08:00
John Zhang
c19d923b85
[KYUUBI #7048] Fix KeyError when parsing unknown Hive type_id in schema inspection
This patch adds try/except block to prevent `KeyError` when mapping unknown `type_id` in Hive schema parsing. Now, if a `type_id` is not recognized, `type_code` is set to `None` instead of raising an exception.

### Why are the changes needed?

Previously, when parsing Hive table schemas, the code attempts to map each `type_id` to a human-readable type name via `ttypes.TTypeId._VALUES_TO_NAMES[type_id]`. If Hive introduced an unknown or custom type (e.g. some might using an non-standard version or data pumping from a totally different data source like *Oracle* into *Hive* databases), a `KeyError` was raised, interrupting the entire SQL query process. This patch adds a `try/except` block so that unrecognized `type_id`s will set `type_code` to `None` instead of raising an error so that the downstream user can decided what to do instead of just an Exception. This makes schema inspection more robust and compatible with evolving Hive data types.

### How was this patch tested?

The patch was tested by running schema inspection on tables containing both standard and unknown/custom Hive column types. For known types, parsing behaves as before. For unknown types, the parser sets `type_code` to `None` without raising an exception, and the rest of the process completes successfully. No unit test was added since this is an edge case dependent on unreachable or custom Hive types, but was tested on typical use cases.

### Was this patch authored or co-authored using generative AI tooling?

No. 😂 It's a minor patch.

Closes #7048 from ZsgsDesign/patch-1.

Closes #7048

4d246d0ec [John Zhang] fix: handle KeyError when parsing Hive type_id mapping

Authored-by: John Zhang <zsgsdesign@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
2025-04-29 10:41:16 +08:00
Alex Wojtowicz
9daf74d9c3
[KYUUBI #6908] Connection class ssl context object paramater
**Why are the changes needed:**
Currently looking to connect to a HiveServer2 behind an NGINX proxy that is requiring mTLS communication. pyHive seems to lack the capability to establish an mTLS connection in applications such as Airflow directly communicating to the HiveServer2 instance.

The change needed is to be able to pass in the parameters for a proper mTLS ssl context to be established. I believe that creating your own ssl_context object is the quickest and cleanest way to do so, leaving the responsibility of configuring it to further implementations and users. Also cuts down on code length.

**How was this patch tested:**
Corresponding pytest fixtures have been added, using the mock module to see if ssl_context object was properly accessed, or if the default one created in the Connection initialization was properly configured.

Was not able to run pytest fixtures specifically, was lacking JDBC driver, first time contributing to open source, happy to run tests if provided guidance. Passed a clean build and test of the entire kyuubi project in local dev environment.

**Was this patch authored or co-authored using generative AI tooling**
Yes, Generated-by Cursor-AI with Claude Sonnet 3.5 agent

Closes #6935 from alexio215/connection-class-ssl-context-param.

Closes #6908

539b29962 [Cheng Pan] Update python/pyhive/tests/test_hive.py
14c607489 [Alex Wojtowicz] Simplified testing, following pattern of other tests, need proper SSL setup with nginx to test ssl_context fully
b947f2454 [Alex Wojtowicz] Added exception handling since JDBC driver will not run in python tests
11f9002bf [Alex Wojtowicz] Passing in fully configured mock object before creating connection
009c5cf24 [Alex Wojtowicz] Added back doc string documentation
e3280bcd8 [Alex Wojtowicz] Python testing
529de8a12 [Alex Wojtowicz] Added ssl_context object. If no obj is provided, then it continues to use default provided parameters

Lead-authored-by: Alex Wojtowicz <awojtowi@akamai.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-02-25 22:22:14 +08:00
Octavian Ciubotaru
2a2e4c2123
[KYUUBI #6905] PyHive HTTP/HTTPS dialect to use the database name from url
### Why are the changes needed?
HTTP dialect ignores the database specified in the URL and uses the "default" instead.

### How was this patch tested?
Tested manually.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #6906 from developster/pyhive-update1.

Closes #6905

6e21d7259 [Cheng Pan] Update python/pyhive/sqlalchemy_hive.py
ec7d4629e [Octavian Ciubotaru] [KYUUBI #6905] PyHive HTTP/HTTPS dialect to use the database name from url

Lead-authored-by: Octavian Ciubotaru <ociubotaru@developmentgateway.org>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2025-02-14 10:30:32 +08:00
Bruce Wong
d3e17680f5 [KYUUBI #6485] Fix the Presto TABLE NOT FOUND error message that failed to match
# 🔍 Description
## Issue References 🔗

This pull request fixes #6485

## Describe Your Solution 🔧

Ignore uppercase and lowercase letters in table names when using regular expressions to match.

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Added unit tests when table names have capital letters.

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6605 from BruceWong96/fix-presto-regex.

Closes #6485

06f737f24 [Bruce Wong] Fix typos
93071754a [Bruce Wong] Added unit tests for table names with both upper and lower case letters
9837030a1 [Bruce Wong] fix table not found

Authored-by: Bruce Wong <603334301@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-08-12 08:38:09 +00:00
wenjie.wang01
a0b9873f81
[KYUUBI #6489] [PYTHON] PyKyuubi get_table_names also supports Spark SQL dialect
# 🔍 Description
## Issue References 🔗

This pull request fixes #6489

## Describe Your Solution 🔧

After my investigation, I found the bug and solution.
The function get_table_names returns an incorrect value when I used Superset to connect to Kyuubi for Spark SQL.
[get_table_names](https://github.com/apache/kyuubi/blob/master/python/pyhive/sqlalchemy_hive.py#L380)

The following code is used to connect to hive directly.
`return [row[0] for row in connection.execute(text(query))]`

Because The following value is returned when the Hive is connected.

show tables in default :
[('student',), ('student_scores',)]

The following code is used to connect to Kyuubi.
`return [row[1] for row in connection.execute(text(query))]`

Because The following value is returned when the Kyuubi is connected.

show tables in default :
[('default', 'employees', False), ('default', 'student', False), ('default', 'student_scores', False)]

So, for the difference in return value, I modified the code.

And I test them in Superset. The code works.

Hive
<img width="1214" alt="image" src="https://github.com/apache/kyuubi/assets/29974394/9048b21d-053e-4b5d-be35-ba29d3bd6848">

Kyuubi
<img width="1085" alt="image" src="https://github.com/apache/kyuubi/assets/29974394/d600dfed-1127-41ea-a0bf-ca662a5487df">

Spark SQL also works properly.
<img width="1199" alt="image" src="https://github.com/apache/kyuubi/assets/29974394/7026e39e-6d63-473d-9e43-eeab580719ea">

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6490 from BruceWong96/branch-kyuubi-6489.

Closes #6489

94a52c0e5 [wenjie.wang01] add else branch.
8ab20becf [wenjie.wang01] fix bug for function get_table_names.
136c7b795 [wenjie.wang01] fix bug for function get_table_names.

Authored-by: wenjie.wang01 <wenjie.wang01@liulishuo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-06-21 19:03:43 +08:00
Harry
06af125b9f
[KYUUBI #6281][PY] Enable hive test in python client
# 🔍 Description
## Issue References 🔗

This pull request enables running hive test cases in python client, however there's one trivial case not covered yet and two others require a proper container setup

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️
Hive test disabled in #6343

#### Behavior With This Pull Request 🎉
Can cover hive test cases

#### Related Unit Tests
No

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6381 from sudohainguyen/ci/hive.

Closes #6281

a861382b1 [Harry] [KYUUBI #6281][PY] Enable hive test in python client

Authored-by: Harry <quanghai.ng1512@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-05-15 14:55:44 +08:00
Harry
9075fbb623
[KYUUBI #6281][PY] Initialize github action for python unit testing
# 🔍 Description
## Issue References 🔗

This pull request fixes #6281

## Describe Your Solution 🔧

The change initialize a CI job to run unit testing on python client, including:
- Set up Github Action based on docker-compose
- Update test cases and test succeeded for dialect `presto` and `trino`
- Temporary disabled hive related test due to test cases are not valid, not about connection
- Update dev dependencies to support python 3.10
- Speed up testing with `pytest-xdist` plugin

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️
Not able to ran unit test in local and on CI

#### Behavior With This Pull Request 🎉
Able to run and partially cover a couple of test cases

#### Related Unit Tests
No

## Additional notes
Next action is about fixing failing test cases or considering skipping some of them if necessary

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6343 from sudohainguyen/ci/init.

Closes #6281

682e575c4 [Harry] Remove xdist out of scope
dc42ca1ff [Harry] Pin pytest packages version
469f1d955 [Harry] Pin ubuntu version
00cef476a [Harry] Use v4 checkout action
96ef83148 [Harry] Remove unnecessary steps
732344a2c [Harry] Add step to tear down containers
1e2c2481a [Harry] Resolved trino and presto test
5b33e3924 [Harry] Make tests runnable
1be033ba3 [Harry] Remove randome flag which causes failed test run
2bc6dc036 [Harry] Switch action setup provider to docker
ea2a76319 [Harry] Initialize github action for python unit testing

Authored-by: Harry <quanghai.ng1512@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-05-07 18:05:03 +08:00
Cheng Pan
f8c7b93f55
[KYUUBI #5686][FOLLOWUP] Rename pyhive to python
# 🔍 Description

This is the follow-up of #5686, renaming `./pyhive` to `./python`, and also adding `**/python/*` to RAT exclusion list temporarily.

"PyHive" may not be a suitable name after being part of Apache Kyuubi, let's use a generic dir name `python`, and discuss the official name later(we probably keep the code at `./python` eventually).

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Recover RAT checked.

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6279 from pan3793/pyhive-1.

Closes #5686

42d338e71 [Cheng Pan] [KYUUBI #5686][FOLLOWUP] Rename pyhive to python

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-04-09 20:30:02 +08:00