Go to file

Kent Yao 3c74836463 readme en / configuration doc kyuui part		2018-03-06 23:52:29 +08:00
bin	mv mvn to build	2018-01-20 00:01:36 +08:00
build	mv mvn to build	2018-01-20 00:01:36 +08:00
docs	readme en / configuration doc kyuui part	2018-03-06 23:52:29 +08:00
if	init commit for kyuubi	2018-01-05 19:38:54 +08:00
patches	readme doc	2018-03-06 11:20:57 +08:00
src	readme en / configuration doc kyuui part	2018-03-06 23:52:29 +08:00
.gitignore	mv mvn to build	2018-01-20 00:01:36 +08:00
.travis.yml	add .travis.yml	2018-03-06 11:35:38 +08:00
LICENSE	Initial commit	2017-12-18 17:05:10 +08:00
pom.xml	add scala test plugin	2018-03-06 14:54:37 +08:00
README_CN.md	readme en / configuration doc kyuui part	2018-03-06 23:52:29 +08:00
README.md	readme en / configuration doc kyuui part	2018-03-06 23:52:29 +08:00
scalastyle-config.xml	1. create sc in a new thread; 2. kill yarn app by app name when sc init timeout	2018-01-17 17:15:35 +08:00

README.md

Kyuubi

Kyuubi is an enhanced edition of Apache Spark's primordial Thrift JDBC/ODBC Server.

The Thrift JDBC/ODBC Server as a similar servcie of Apache Hive HiveServer2 for Spark SQL, acting as a distributed query engine using its JDBC/ODBC or command-line interface. In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries, without the need to write any code. These users can make pretty bussiness reports with massive data using some BI tools which supportted JDBC/ODBC connections, such as Tableau, NetEase YouData and so on. Benifiting from Apache Spark's capabilty, they can achive much more perfomance improvement than Apache Hive as a SQL on Hadoop service.

But Unfortunately, due to the limitations of Spark's own architecture，to be used as an enterprise-class product, there are a number of problems compared with HiveServer2，such as multi-tenant isolation, authentication/authorization, high concurrency, high availability, and so on. And the Apache Spark community's support for this module has been in a state of prolonged stagnation.

Kyuubi has enhanced the Thrift JDBCODBC Server in some ways for these existing problems, as shown in the following table,

---	Thrift JDBC/ODBC Server	Kyuubi	Comments
Multi SparkContext Instances	✘	✔	Apache Spark has several issues to have multiple SparkContext instances in one single JVM，see here. Setting `spark.driver.allowMultipleContexts=true` only enables SparkContext to be instantiate many times but these instance can only share and use the scheduler and execution environments of the last initialized one, which is kind of like a shallow copy of a Java object. The patches of Kyuubi provides a way of isolating the scheduler and execution environments by user.
Dynamic SparkContext Initialization	✘	✔	SparkContext initialization is delayed to the phase of user session creation in Kyuubi, while Thrift JDBC/ODBC Server create one only when it starts.
Dynamic SparkContext Recycling	✘	✔	In Thrift JDBC/ODBC Server, SparkContext is a resident variable. Kyuubi will cache SparkContext instance for a while after the server terminating it.
Dynamic Yarn Queue	✘	✔	We use spark.yarn.queue to specifying the queue that Spark on Yarn applications run into. Once Thrift JDBC/ODBC Server started, it becomes unchangable, while HiveServer2 could switch queue by`set mapred.job.queue.name=thequeue`, Kyuubi adopts a compromise method which could identify and use spark.yarn.queue in the connection string.
Dynamic Configing	only spark.sql.*	✔	Kyuubi supports all Spark/Hive/Hadoop configutations, such as `spark.executor.cores/memory`, to be set in the connection string which will be used to initialize SparkContext.
Authorization	✘	✘	Spark Authorizer will be add to Kyuubi soon.
Impersonation	`--proxy-user single user`	✔	Kyuubi fully support `hive.server2.proxy.user` and `hive.server2.doAs`
Multi Tenancy	✘	✔	Based on the above features，Kyuubi is able to run as a multi-tenant server on a LCE supported Yarn cluster.
SQL Operaton Log	✘	✔	Kyuubi redirect sql operation log to local file which has an interface for the client to fetch.
High Availability	✘	✔	Based on ZooKeeper
cluster deploy mode	✘	✘	yarn cluster mode will be supported soon

Getting Started

Packaging

Kyuubi server is based on Maven,

build/mvn clean package

Running the code above in the Kyuubi project directory is all we need to build a runnable Kyuubi server.

Start Kyuubi

1. As a normal spark application

For test cases, your can run Kyuubi Server as a normal spark application.

$ $SPARK_HOME/bin/spark-submit \ 
    --class yaooqinn.kyuubi.server.KyuubiServer \
    --master yarn \
    --deploy-mode client \
    --driver-memory 10g \
    --conf spark.hadoop.hive.server2.thrift.port=10009 \
    $KYUUBI_HOME/target/kyuubi-1.0.0-SNAPSHOT.jar

2. As a long running service

Using nohup and & could run Kyuubi as a long running service

$ nohup $SPARK_HOME/bin/spark-submit \ 
    --class yaooqinn.kyuubi.server.KyuubiServer \
    --master yarn \
    --deploy-mode client \
    --driver-memory 10g \
    --conf spark.hadoop.hive.server2.thrift.port=10009 \
    $KYUUBI_HOME/target/kyuubi-1.0.0-SNAPSHOT.jar &

3. With built-in startup script

The more recommended way is through the built-in startup script bin/start-kyuubi.sh First of all, export SPARK_HOME in $KYUUBI_HOME/bin/kyuubi-env.sh`

export SPARK_HOME=/the/path/to/an/runable/spark/binary/dir

And then the last, start Kyuubi with bin/start-kyuubi.sh

$ bin/start-kyuubi.sh \ 
    --master yarn \
    --deploy-mode client \
    --driver-memory 10g \
    --conf spark.hadoop.hive.server2.thrift.port=10009 \

Run Spark SQL on Kyuubi

Now you can use beeline, Tableau or Thrift API based programs to connect to Kyuubi server.

Stop Kyuubi

bin/stop-kyuubi.sh

Notes: Obviously，without the patches we supplied, Kyuubi is mostly same with the Thrift JDBC/ODBC Server as an non-mutli-tenancy server.

Multi Tenancy Support

Prerequisites

Spark On Yarn + Setup Spark On Yarn + LunixExecutorCantainer + Yarn queues(Optional)
Thrift JDBC/ODBC Server Configutations + Configuration of Hive is done by placing your hive-site.xml, core-site.xml and hdfs-site.xml files in $SPARK_HOME/conf/.
Patch

README.md Unescape Escape