Go to file

hustfeiwang 5ad223556b [KYUUBI-120]define interface class IKyuubiOperation to allow different implementation of kyuubiOperation --- --- fix #181 fix #120 --- Squashed commit of the following: commit 3e7f81f3b8e5c4663fa54ed3e676312139693197 Author: hustfeiwang <wangfei3@corp.netease.com> Date: Thu May 23 18:32:08 2019 +0800 fix unit test commit 73763d7ca8416156132cb9e8df3e86966bf8c6e6 Author: hustfeiwang <wangfei3@corp.netease.com> Date: Thu May 23 09:48:21 2019 +0800 set statementId to a val commit 128effa6a9a89c5bb8f9c91f219e4e257ff96372 Author: hustfeiwang <wangfei3@corp.netease.com> Date: Wed May 22 16:06:45 2019 +0800 set isClosedOrCanceled to a protected method commit 267b55f1d2497872e348fc494a3ff7132a6012f0 Author: hustfeiwang <wangfei3@corp.netease.com> Date: Wed May 22 10:10:48 2019 +0800 KYUUBI-120: define interface class IKyuubiOperation to allow different implementation of kyuubiOperation		2019-05-24 11:55:32 +08:00
.github/ISSUE_TEMPLATE	Update issue templates	2018-06-01 11:29:12 +08:00
bin	Squashed commit of the following:	2019-05-10 12:01:35 +08:00
build	[KYUUBI-167]Handling ApplicationMaster token expiry to support long caching sparkcontext with dynamic executor allocation (#168 )	2019-03-19 13:51:08 +08:00
docs	[KYUUBI-186]Add a max cache time to clean SparkSessions that may have token expiry issue	2019-05-21 22:07:18 +08:00
kyuubi-common	[BUILD]Bump up to v0.7.0-SNAPSHOT	2019-05-10 11:58:22 +08:00
kyuubi-server	[KYUUBI-120]define interface class IKyuubiOperation to allow different implementation of kyuubiOperation	2019-05-24 11:55:32 +08:00
_config.yml	add documentation for kyuubi deployment	2018-07-30 19:19:46 +08:00
.gitignore	[KYUUBI-150][FOLLOWUP]using the classLoader in IsolatedClassLoader (#166 )	2019-03-19 15:09:36 +08:00
.travis.yml	[KYUUBI-167][FOLLOWUP]populate tokens via spark session cache mgr (#183 )	2019-05-07 10:32:04 +08:00
CODE_OF_CONDUCT.md	Create CODE_OF_CONDUCT.md	2018-03-07 15:54:22 +08:00
CONTRIBUTING.md	Create CONTRIBUTING.md	2018-03-07 15:51:46 +08:00
LICENSE	Initial commit	2017-12-18 17:05:10 +08:00
pom.xml	[BUILD]Bump up to v0.7.0-SNAPSHOT	2019-05-10 11:58:22 +08:00
README.md	[TYPO]ADD badges - licences, lines, release	2019-03-28 16:28:38 +08:00
scalastyle-config.xml	1. create sc in a new thread; 2. kill yarn app by app name when sc init timeout	2018-01-17 17:15:35 +08:00

README.md

Kyuubi

Kyuubi is an enhanced edition of the Apache Spark's primordial Thrift JDBC/ODBC Server. It is mainly designed for directly running SQL towards a cluster with all components including HDFS, YARN, Hive MetaStore, and itself secured. Kyuubi is a Spark SQL thrift service with end-to-end multi tenant guaranteed. Please go to Kyuubi Architecture to learn more if you are interested.

Basically, the Thrift JDBC/ODBC Server as a similar ad-hoc SQL query service of Apache Hive's HiveServer2 for Spark SQL, acts as a distributed query engine using its JDBC/ODBC or command-line interface. In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries, without the need to write any code. We can make pretty business reports with massive data using some BI tools which supported JDBC/ODBC connections, such as Tableau, NetEase YouData and so on. Profiting from Apache Spark's capability, we can archive much more performance improvement than Apache Hive as a SQL on Hadoop service.

But unfortunately, due to the limitations of Spark's own architecture，to be used as an enterprise-class product, there are a number of problems compared with HiveServer2，such as multi-tenant isolation, authentication/authorization, high concurrency, high availability, and so on. And the Apache Spark community's support for this module has been in a state of prolonged stagnation.

Kyuubi has enhanced the Thrift JDBC/ODBC Server in some ways for solving these existing problems, as shown in the following table.

Features	Spark Thrift Server	Kyuubi	Comments
multiple SparkContext	✘	✔	User tagged SparkContext
lazy SparkContext	✘	✔	Session level SparkContext
SparkContext cache	✘	✔	SparkContext Cache Management
dynamic queue	✘	✔	Kyuubi identifies `spark.yarn.queue` in the connection string.
session level configurations	`spark.sql.*`	✔	Dynamic Resource Requesting
authentication	✔	✔	Authentication/Security Guide
authorization	✘	✔	Kyuubi ACL Management Guide
impersonation	✘	✔	Kyuubi fully supports `hive.server2.proxy.user` and `hive.server2.doAs`
multi tenancy	✘	✔	Based on the above features，Kyuubi is able to run as a multi-tenant server on a LCE supported Yarn cluster.
operation log	✘	✔	Kyuubi redirect sql operation log to local file which has an interface for the client to fetch.
high availability	✘	✔	ZooKeeper Dynamic Service Discovery
containerization	✘	✔	Kyuubi Containerization Guide
type mapping	✘	✔	Kyuubi support Spark result/schema to be directly converted to Thrift result/schemas bypassing Hive format results

Getting Started

Packaging

Please refer to the Building Kyuubi in the online documentation for an overview on how to build Kyuubi.

Start Kyuubi

We can start Kyuubi with the built-in startup script bin/start-kyuubi.sh. First of all, export SPARK_HOME in $KYUUBI_HOME/bin/kyuubi-env.sh

export SPARK_HOME=/the/path/to/a/runable/spark/binary/dir

And then the last, start Kyuubi with bin/start-kyuubi.sh

$ bin/start-kyuubi.sh \ 
    --master yarn \
    --deploy-mode client \
    --driver-memory 10g \
    --conf spark.kyuubi.frontend.bind.port=10009

Run Spark SQL on Kyuubi

Now you can use beeline, Tableau or Thrift API based programs to connect to Kyuubi server.

Stop Kyuubi

bin/stop-kyuubi.sh

Multi Tenancy Support

Prerequisites

Kyuubi may work well with different deployments such as non-secured Yarn, Standalone, Mesos or even local mode, but it is mainly designed for a secured HDFS/Yarn Cluster on which Kyuubi will play well with multi tenant and secure features.

Suppose that you already have a secured HDFS cluster for deploying Spark, Hive or other applications.

Configure Yarn

YARN Secure Containers
- To configure the NodeManager to use the LinuxExecutorCantainer
- Queues(Optional), please refer to Capacity Scheduler or Fair Scheduler to see more.

Spark on Yarn

Setup for Spark On Yarn Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster.

Configure Hive

Configuration of Hive is done by placing your hive-site.xml, core-site.xml and hdfs-site.xml files in $SPARK_HOME/conf.

Configuration

Please refer to the Configuration Guide in the online documentation for an overview on how to configure Kyuubi.

Authentication

Please refer to the Authentication/Security Guide in the online documentation for an overview on how to enable security for Kyuubi.

Additional Documentations

Building Kyuubi
Kyuubi Deployment Guide
Kyuubi Containerization Guide
High Availability Guide
Configuration Guide
Authentication/Security Guide
Kyuubi ACL Management Guide
Kyuubi Architecture

README.md Unescape Escape