[KYUUBI #6281][PY] Initialize github action for python unit testing

# 🔍 Description
## Issue References 🔗

This pull request fixes #6281

## Describe Your Solution 🔧

The change initialize a CI job to run unit testing on python client, including:
- Set up Github Action based on docker-compose
- Update test cases and test succeeded for dialect `presto` and `trino`
- Temporary disabled hive related test due to test cases are not valid, not about connection
- Update dev dependencies to support python 3.10
- Speed up testing with `pytest-xdist` plugin

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️
Not able to ran unit test in local and on CI

#### Behavior With This Pull Request 🎉
Able to run and partially cover a couple of test cases

#### Related Unit Tests
No

## Additional notes
Next action is about fixing failing test cases or considering skipping some of them if necessary

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6343 from sudohainguyen/ci/init.

Closes #6281

682e575c4 [Harry] Remove xdist out of scope
dc42ca1ff [Harry] Pin pytest packages version
469f1d955 [Harry] Pin ubuntu version
00cef476a [Harry] Use v4 checkout action
96ef83148 [Harry] Remove unnecessary steps
732344a2c [Harry] Add step to tear down containers
1e2c2481a [Harry] Resolved trino and presto test
5b33e3924 [Harry] Make tests runnable
1be033ba3 [Harry] Remove randome flag which causes failed test run
2bc6dc036 [Harry] Switch action setup provider to docker
ea2a76319 [Harry] Initialize github action for python unit testing

Authored-by: Harry <quanghai.ng1512@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
This commit is contained in:
Harry 2024-05-07 18:05:03 +08:00 committed by Cheng Pan
parent 28d8c8ec78
commit 9075fbb623
No known key found for this signature in database
GPG Key ID: 8001952629BCC75D
26 changed files with 210 additions and 126 deletions

65
.github/workflows/python.yml vendored Normal file
View File

@ -0,0 +1,65 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Python Client
on:
push:
branches:
- master
- branch-*
pull_request:
branches:
- master
- branch-*
concurrency:
group: python-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
unit-test:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10"]
env:
PYTHONHASHSEED: random
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Start Testing Containers
run: |
cd python/docker/
docker compose up -d --wait
docker compose exec hive-server /opt/hive/scripts/make_test_tables.sh
- name: Install dependencies
run: |
cd python
./scripts/install-deps.sh
- name: Run tests
run: |
cd python
pytest -v
- name: Tear down Containers
run: |
cd python/docker/
docker compose down --volumes

19
python/.gitignore vendored Normal file
View File

@ -0,0 +1,19 @@
cover/
.coverage
/dist/
/build/
.DS_Store
*.egg
/env/
/htmlcov/
.idea/
.project
*.pyc
.pydevproject
/*.egg-info/
.settings
.cache/
*.iml
/scripts/.thrift_gen
venv/
.envrc

View File

@ -2,13 +2,11 @@
flake8==3.4.1
mock==2.0.0
pycodestyle==2.3.1
pytest==3.2.1
pytest-cov==2.5.1
pytest-flake8==0.8.1
pytest-random==0.2
pytest-timeout==1.2.0
pytest==7.4.4
pytest-cov==5.0.0
# actual dependencies: let things break if a package changes
sqlalchemy>=1.3.0
requests>=1.0.0
requests_kerberos>=0.12.0
sasl>=0.2.1

View File

@ -0,0 +1,2 @@
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore:9083

View File

@ -0,0 +1,2 @@
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore:9083

View File

@ -0,0 +1,61 @@
version: "3"
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
volumes:
- namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- hadoop-hive.env
ports:
- "50070:50070"
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
volumes:
- datanode:/hadoop/dfs/data
env_file:
- hadoop-hive.env
environment:
SERVICE_PRECONDITION: "namenode:50070"
ports:
- "50075:50075"
hive-server:
image: bde2020/hive:2.3.2-postgresql-metastore
env_file:
- hadoop-hive.env
volumes:
- ../scripts:/opt/hive/scripts
environment:
HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
SERVICE_PRECONDITION: "hive-metastore:9083"
ports:
- "10000:10000"
hive-metastore:
image: bde2020/hive:2.3.2-postgresql-metastore
env_file:
- hadoop-hive.env
command: /opt/hive/bin/hive --service metastore
environment:
SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432"
ports:
- "9083:9083"
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.3.0
presto-coordinator:
image: shawnzhu/prestodb:0.181
ports:
- "8080:8080"
volumes:
- ./conf/presto/:/etc/presto
trino:
image: trinodb/trino:351
ports:
- "18080:18080"
volumes:
- ./conf/trino:/etc/trino
volumes:
namenode:
datanode:

View File

@ -0,0 +1,30 @@
HIVE_SITE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://hive-metastore-postgresql/metastore
HIVE_SITE_CONF_javax_jdo_option_ConnectionDriverName=org.postgresql.Driver
HIVE_SITE_CONF_javax_jdo_option_ConnectionUserName=hive
HIVE_SITE_CONF_javax_jdo_option_ConnectionPassword=hive
HIVE_SITE_CONF_datanucleus_autoCreateSchema=false
HIVE_SITE_CONF_hive_metastore_uris=thrift://hive-metastore:9083
HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
CORE_CONF_hadoop_http_staticuser_user=root
CORE_CONF_hadoop_proxyuser_hue_hosts=*
CORE_CONF_hadoop_proxyuser_hue_groups=*
HDFS_CONF_dfs_webhdfs_enabled=true
HDFS_CONF_dfs_permissions_enabled=false
YARN_CONF_yarn_log___aggregation___enable=true
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs
YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
YARN_CONF_yarn_timeline___service_enabled=true
YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
YARN_CONF_yarn_timeline___service_hostname=historyserver
YARN_CONF_yarn_resourcemanager_address=resourcemanager:8032
YARN_CONF_yarn_resourcemanager_scheduler_address=resourcemanager:8030
YARN_CONF_yarn_resourcemanager_resource__tracker_address=resourcemanager:8031

View File

@ -17,6 +17,7 @@ import unittest
from decimal import Decimal
import mock
import pytest
import thrift.transport.TSocket
import thrift.transport.TTransport
import thrift_sasl
@ -30,11 +31,12 @@ from pyhive.tests.dbapi_test_case import with_cursor
_HOST = 'localhost'
@pytest.mark.skip(reason="Temporary disabled")
class TestHive(unittest.TestCase, DBAPITestCase):
__test__ = True
def connect(self):
return hive.connect(host=_HOST, configuration={'mapred.job.tracker': 'local'})
return hive.connect(host=_HOST, port=10000, configuration={'mapred.job.tracker': 'local'})
@with_cursor
def test_description(self, cursor):
@ -151,10 +153,11 @@ class TestHive(unittest.TestCase, DBAPITestCase):
self.assertIsNone(cursor.description)
self.assertRaises(hive.ProgrammingError, cursor.fetchone)
@pytest.mark.skip
def test_ldap_connection(self):
rootdir = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
orig_ldap = os.path.join(rootdir, 'scripts', 'travis-conf', 'hive', 'hive-site-ldap.xml')
orig_none = os.path.join(rootdir, 'scripts', 'travis-conf', 'hive', 'hive-site.xml')
orig_ldap = os.path.join(rootdir, 'scripts', 'conf', 'hive', 'hive-site-ldap.xml')
orig_none = os.path.join(rootdir, 'scripts', 'conf', 'hive', 'hive-site.xml')
des = os.path.join('/', 'etc', 'hive', 'conf', 'hive-site.xml')
try:
subprocess.check_call(['sudo', 'cp', orig_ldap, des])
@ -209,11 +212,12 @@ class TestHive(unittest.TestCase, DBAPITestCase):
with contextlib.closing(conn.cursor()) as cursor:
cursor.execute('SELECT * FROM one_row')
self.assertEqual(cursor.fetchall(), [(1,)])
@pytest.mark.skip
def test_custom_connection(self):
rootdir = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
orig_ldap = os.path.join(rootdir, 'scripts', 'travis-conf', 'hive', 'hive-site-custom.xml')
orig_none = os.path.join(rootdir, 'scripts', 'travis-conf', 'hive', 'hive-site.xml')
orig_ldap = os.path.join(rootdir, 'scripts', 'conf', 'hive', 'hive-site-custom.xml')
orig_none = os.path.join(rootdir, 'scripts', 'conf', 'hive', 'hive-site.xml')
des = os.path.join('/', 'etc', 'hive', 'conf', 'hive-site.xml')
try:
subprocess.check_call(['sudo', 'cp', orig_ldap, des])

View File

@ -13,6 +13,7 @@ from decimal import Decimal
import requests
import pytest
from pyhive import exc
from pyhive import presto
from pyhive.tests.dbapi_test_case import DBAPITestCase
@ -231,6 +232,7 @@ class TestPresto(unittest.TestCase, DBAPITestCase):
).cursor()
)
@pytest.mark.skip(reason='This test requires a proxy server running on localhost:9999')
def test_requests_kwargs(self):
connection = presto.connect(
host=_HOST, port=_PORT, source=self.id(),

View File

@ -5,6 +5,7 @@ from pyhive.sqlalchemy_hive import HiveDate
from pyhive.sqlalchemy_hive import HiveDecimal
from pyhive.sqlalchemy_hive import HiveTimestamp
from sqlalchemy.exc import NoSuchTableError, OperationalError
import pytest
from pyhive.tests.sqlalchemy_test_case import SqlAlchemyTestCase
from pyhive.tests.sqlalchemy_test_case import with_engine_connection
from sqlalchemy import types
@ -60,6 +61,7 @@ _ONE_ROW_COMPLEX_CONTENTS = [
# ]
@pytest.mark.skip(reason="Temporarily disabled")
class TestSqlAlchemyHive(unittest.TestCase, SqlAlchemyTestCase):
def create_engine(self):
return create_engine('hive://localhost:10000/default')

View File

@ -70,10 +70,10 @@ class TestTrino(TestPresto):
('timestamp', 'timestamp', None, None, None, None, True),
('binary', 'varbinary', None, None, None, None, True),
('array', 'array(integer)', None, None, None, None, True),
('map', 'map(integer,integer)', None, None, None, None, True),
('struct', 'row(a integer,b integer)', None, None, None, None, True),
('map', 'map(integer, integer)', None, None, None, None, True),
('struct', 'row(a integer, b integer)', None, None, None, None, True),
# ('union', 'varchar', None, None, None, None, True),
('decimal', 'decimal(10,1)', None, None, None, None, True),
('decimal', 'decimal(10, 1)', None, None, None, None, True),
])
rows = cursor.fetchall()
expected = [(

10
python/scripts/install-deps.sh Executable file
View File

@ -0,0 +1,10 @@
#!/bin/bash -eux
source /etc/lsb-release
sudo apt-get -q update
sudo apt-get -q install -y g++ libsasl2-dev libkrb5-dev
pip install --upgrade pip
pip install -r dev_requirements.txt
pip install -e .

View File

@ -1,2 +0,0 @@
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083

View File

@ -1,2 +0,0 @@
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083

View File

@ -1,96 +0,0 @@
#!/bin/bash -eux
source /etc/lsb-release
echo "deb [arch=amd64] https://archive.cloudera.com/${CDH}/ubuntu/${DISTRIB_CODENAME}/amd64/cdh ${DISTRIB_CODENAME}-cdh${CDH_VERSION} contrib
deb-src https://archive.cloudera.com/${CDH}/ubuntu/${DISTRIB_CODENAME}/amd64/cdh ${DISTRIB_CODENAME}-cdh${CDH_VERSION} contrib" | sudo tee /etc/apt/sources.list.d/cloudera.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 327574EE02A818DD
sudo apt-get -q update
sudo apt-get -q install -y oracle-java8-installer python-dev g++ libsasl2-dev maven
sudo update-java-alternatives -s java-8-oracle
#
# LDAP
#
sudo apt-get -q -y --no-install-suggests --no-install-recommends --force-yes install ldap-utils slapd
sudo mkdir -p /tmp/slapd
sudo slapd -f $(dirname $0)/ldap_config/slapd.conf -h ldap://localhost:3389 &
while ! nc -vz localhost 3389; do sleep 1; done
sudo ldapadd -h localhost:3389 -D cn=admin,dc=example,dc=com -w test -f $(dirname $0)/../pyhive/tests/ldif_data/base.ldif
sudo ldapadd -h localhost:3389 -D cn=admin,dc=example,dc=com -w test -f $(dirname $0)/../pyhive/tests/ldif_data/INITIAL_TESTDATA.ldif
#
# Hive
#
sudo apt-get -q install -y --force-yes hive
javac -cp /usr/lib/hive/lib/hive-service.jar $(dirname $0)/travis-conf/com/dropbox/DummyPasswdAuthenticationProvider.java
jar cf $(dirname $0)/dummy-auth.jar -C $(dirname $0)/travis-conf com
sudo cp $(dirname $0)/dummy-auth.jar /usr/lib/hive/lib
# Hack around broken symlink in Hive's installation
# /usr/lib/hive/lib/zookeeper.jar -> ../../zookeeper/zookeeper.jar
# Without this, Hive fails to start up due to failing to find ZK classes.
sudo ln -nsfv /usr/share/java/zookeeper.jar /usr/lib/hive/lib/zookeeper.jar
sudo mkdir -p /user/hive
sudo chown hive:hive /user/hive
sudo cp $(dirname $0)/travis-conf/hive/hive-site.xml /etc/hive/conf/hive-site.xml
sudo apt-get -q install -y --force-yes hive-metastore hive-server2 || (grep . /var/log/hive/* && exit 2)
while ! nc -vz localhost 9083; do sleep 1; done
while ! nc -vz localhost 10000; do sleep 1; done
sudo -Eu hive $(dirname $0)/make_test_tables.sh
#
# Presto
#
sudo apt-get -q install -y python # Use python2 for presto server
mvn -q org.apache.maven.plugins:maven-dependency-plugin:3.0.0:copy \
-Dartifact=com.facebook.presto:presto-server:${PRESTO}:tar.gz \
-DoutputDirectory=.
tar -x -z -f presto-server-*.tar.gz
rm -rf presto-server
mv presto-server-*/ presto-server
cp -r $(dirname $0)/travis-conf/presto presto-server/etc
/usr/bin/python2.7 presto-server/bin/launcher.py start
#
# Trino
#
sudo apt-get -q install -y python # Use python2 for trino server
mvn -q org.apache.maven.plugins:maven-dependency-plugin:3.0.0:copy \
-Dartifact=io.trino:trino-server:${TRINO}:tar.gz \
-DoutputDirectory=.
tar -x -z -f trino-server-*.tar.gz
rm -rf trino-server
mv trino-server-*/ trino-server
cp -r $(dirname $0)/travis-conf/trino trino-server/etc
/usr/bin/python2.7 trino-server/bin/launcher.py start
#
# Python
#
pip install $SQLALCHEMY
pip install -e .
pip install -r dev_requirements.txt
# Sleep so Presto has time to start up.
# Otherwise we might get 'No nodes available to run query' or 'Presto server is still initializing'
while ! grep -q 'SERVER STARTED' /tmp/presto/data/var/log/server.log; do sleep 1; done
# Sleep so Trino has time to start up.
# Otherwise we might get 'No nodes available to run query' or 'Presto server is still initializing'
while ! grep -q 'SERVER STARTED' /tmp/trino/data/var/log/server.log; do sleep 1; done

View File

@ -2,20 +2,9 @@
tag_build =
[tool:pytest]
timeout = 100
timeout_method = thread
addopts = --random --tb=short --cov pyhive --cov-report html --cov-report term --flake8
addopts = --tb=short --cov pyhive --cov-report html --cov-report term
norecursedirs = env
python_files = test_*.py
flake8-max-line-length = 100
flake8-ignore =
TCLIService/*.py ALL
pyhive/sqlalchemy_backports.py ALL
presto-server/** ALL
pyhive/hive.py F405
pyhive/presto.py F405
pyhive/trino.py F405
W503
filterwarnings =
error
# For Python 2 flake8