diff --git a/docs/imgs/spark_jobs_page.png b/docs/imgs/spark_jobs_page.png
index de52466cc..8014447d1 100644
Binary files a/docs/imgs/spark_jobs_page.png and b/docs/imgs/spark_jobs_page.png differ
diff --git a/docs/quick_start/quick_start.md b/docs/quick_start/quick_start.md
index bc27c1815..fa89c624c 100644
--- a/docs/quick_start/quick_start.md
+++ b/docs/quick_start/quick_start.md
@@ -16,59 +16,73 @@
-->
+

-# Getting Started with Kyuubi
+# Getting Started with Apache Kyuubi
## Getting Kyuubi
-Currently, Kyuubi maintains all releases on GitHub directly. You can get the most recent stable release of Kyuubi here:
+Currently, Apache Kyuubi maintains all its releases on our official [website](https://kyuubi.apache.org/releases.html).
+You can get the most recent stable release of Apache Kyuubi here:
-Download
+Download
## Requirements
-These are essential components required for Kyuubi to startup. For quick start deployment, the only thing you need is `JAVA_HOME` being correctly set. The Kyuubi release package you downloaded or built contains the rest prerequisites inside already.
+These are essential components required for Kyuubi to startup.
+For quick start deployment, the only thing you need is `JAVA_HOME` and `SPARK_HOME` being correctly set.
+The Kyuubi release package you downloaded or built contains the rest prerequisites inside already.
Components| Role | Optional | Version | Remarks
--- | --- | --- | --- | ---
-Java | Java
Runtime
Environment | Required | 1.8 | Kyuubi is pre-built with Java 1.8
-Spark | Distribute
SQL
Engine | Optional | 3.0 and above | By default Kyuubi is pre-built w/
a Apache Spark release inside at
`$KYUUBI_HOME/externals`
+Java | Java
Runtime
Environment | Required | Java 8/11 | Kyuubi is pre-built with Java 8
+Spark | Distributed
SQL
Engine | Required | 3.0.0 and above | By default Kyuubi binary release is delivered without
a Spark tarball.
HDFS | Distributed
File
System | Optional | referenced
by
Spark | Hadoop Distributed File System is a
part of Hadoop framework, used to
store and process the datasets.
You can interact with any
Spark-compatible versions of HDFS.
Hive | Metastore | Optional | referenced
by
Spark | Hive Metastore for Spark SQL to connect
-Zookeeper | Service
Discovery | Optional | Any
zookeeper
ensemble
compatible
with
curator(2.7.1) | By default, Kyuubi provides a
embeded Zookeeper server inside for
non-production use.
+Zookeeper | Service
Discovery | Optional | Any
zookeeper
ensemble
compatible
with
curator(2.12.0) | By default, Kyuubi provides a
embeded Zookeeper server inside for
non-production use.
-Additionally, if you want to work with other Spark compatible systems or plugins, you only need to take care of them as using them with regular Spark applications. For example, you can run Spark SQL engines created by the Kyuubi on any cluster manager, including YARN, Kubernetes, Mesos, e.t.c... Or, you can manipulate data from different data sources with the Spark Datasource API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and e.t.c...
+Additionally, if you want to work with other Spark compatible systems or plugins, you only need to take care of them as using them with regular Spark applications.
+For example, you can run Spark SQL engines created by the Kyuubi on any cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
+Or, you can manipulate data from different data sources with the Spark Datasource API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and e.t.c...
## Installation
To install Kyuubi, you need to unpack the tarball. For example,
```bash
-tar zxf kyuubi-1.0.2-bin-spark-3.0.1.tgz
+tar zxf apache-kyuubi-1.3.1-incubating-bin.tgz
```
-This will result in the creation of a subdirectory named `kyuubi-1.0.2-bin-spark-3.0.1` shown below, where the `1.0.2` is the Kyuubi version, and `3.0.1` is the pre-built Spark version.
+This will result in the creation of a subdirectory named `apache-kyuubi-1.3.1-incubating-bin` shown below,
```bash
-kyuubi-1.0.2-bin-spark-3.0.1
+apache-kyuubi-1.3.1-incubating-bin
+├── DISCLAIMER
├── LICENSE
+├── NOTICE
├── RELEASE
├── bin
-│ └── kyuubi
├── conf
-│ ├── kyuubi-defaults.conf
-│ ├── kyuubi-env.sh
-│ └── log4j.properties
+| ├── kyuubi-defaults.conf.template
+│ ├── kyuubi-env.sh.template
+│ └── log4j.properties.template
+├── docker
+│ ├── Dockerfile
+│ ├── kyuubi-configmap.yaml
+│ ├── kyuubi-pod.yaml
+│ └── kyuubi-service.yaml
+├── extension
+│ └── kyuubi-extension-spark-3-1_2.12-1.3.1-incubating.jar
├── externals
-│ ├── engines
-│ └── spark-3.0.1-bin-hadoop2.7
+│ └── engines
├── jars
+├── licenses
├── logs
├── pid
└── work
@@ -76,22 +90,23 @@ kyuubi-1.0.2-bin-spark-3.0.1
From top to bottom are:
+- DISCLAIMER: the disclaimer made by Apache Kyuubi Community as a project still in ASF Incubator.
- LICENSE: the [APACHE LICENSE, VERSION 2.0](https://www.apache.org/licenses/LICENSE-2.0) we claim to obey.
-- RELEASE: the build information of this package
+- RELEASE: the build information of this package.
+- NOTICE: the natice made by Apache Kyuubi Community about its project and dependencies.
- bin: the entry of the Kyuubi server with `kyuubi` as the startup script.
- conf: all the defaults used by Kyuubi Server itself or creating a session with Spark applications.
- externals
- engines: contains all kinds of SQL engines that we support, e.g. Apache Spark, Apache Flink(coming soon).
- - spark-3.0.1-bin-hadoop2.7: a pre-downloaded official Spark release, used as default.
+- licenses: a bunch of licenses included
- jars: packages needed by the Kyuubi server.
- logs: Where the logs of the Kyuubi server locates.
- pid: stores the PID file of the Kyuubi server instance.
- work: the root of the working directories of all the forked sub-processes, a.k.a. SQL engines.
-
## Running Kyuubi
-As mentioned above, for a quick start deployment, then only you need to be sure is that your java runtime environment is correct.
+As mentioned above, for a quick start deployment, then only you need to be sure is that your java runtime environment and `SPARK_HOME` are correct.
### Setup JAVA
@@ -112,14 +127,22 @@ export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.5.jdk/Contents/Home
java version "11.0.5" 2019-10-15 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.5+10-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode)
-
```
The recommended place to set `JAVA_HOME` is `$KYUUBI_HOME/conf/kyuubi-env.sh`, as the ways above are too flaky.
The `JAVA_HOME` in `$KYUUBI_HOME/conf/kyuubi-env.sh` will take others' precedence.
-### Starting Kyuubi
+### Setup Spark
+Similar to `JAVA_HOME`, you can also set `SPARK_HOME` in different ways. However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
+
+For example,
+
+```bash
+SPARK_HOME=~/Downloads/spark-3.2.0-bin-hadoop3.2
+```
+
+### Starting Kyuubi
```bash
bin/kyuubi start
@@ -127,19 +150,19 @@ bin/kyuubi start
It will print all essential environment variables on the screen during the server starts, and you may check whether they are expected.
-```logtalk
-Starting Kyuubi Server from /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1
-Using kyuubi.sh environment file /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/conf/kyuubi-env.sh to initialize...
+```log
+Starting Kyuubi Server from /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin
+Warn: Not find kyuubi environment file /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin/conf/kyuubi-env.sh, using default ones...
JAVA_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_251.jdk/Contents/Home
-KYUUBI_HOME: /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1
-KYUUBI_CONF_DIR: /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/conf
-KYUUBI_LOG_DIR: /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/logs
-KYUUBI_PID_DIR: /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/pid
-KYUUBI_WORK_DIR_ROOT: /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/work
-SPARK_HOME: /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/externals/spark-3.0.1-bin-hadoop2.7
-SPARK_CONF_DIR:
+KYUUBI_HOME: /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin
+KYUUBI_CONF_DIR: /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin/conf
+KYUUBI_LOG_DIR: /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin/logs
+KYUUBI_PID_DIR: /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin/pid
+KYUUBI_WORK_DIR_ROOT: /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin/work
+SPARK_HOME: /Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2
+SPARK_CONF_DIR: /Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/conf
HADOOP_CONF_DIR:
-Starting org.apache.kyuubi.server.KyuubiServer, logging to /Users/kentyao/kyuubi/kyuubi-1.0.2-bin-spark-3.0.1/logs/kyuubi-kentyao-org.apache.kyuubi.server.KyuubiServer-hulk.local.out
+Starting org.apache.kyuubi.server.KyuubiServer, logging to /Users/kentyao/svn-kyuubi/v1.3.1-incubating-rc0/apache-kyuubi-1.3.1-incubating-bin/logs/kyuubi-kentyao-org.apache.kyuubi.server.KyuubiServer-hulk.local.out
Welcome to
__ __ __
/\ \/\ \ /\ \ __
@@ -150,7 +173,6 @@ Welcome to
\/_/\/_/`/___/> \/___/ \/___/ \/___/ \/_/
/\___/
\/__/
-
```
If all goes well, this will result in the creation of the Kyuubi server instance with a `PID` stored in `$KYUUBI_HOME/pid/kyuubi--org.apache.kyuubi.server.KyuubiServer.pid`
@@ -158,7 +180,7 @@ If all goes well, this will result in the creation of the Kyuubi server instance
Then, you can get the JDBC connection URL at the end of the log file, e.g.
```
-FrontendService: Starting and exposing JDBC connection at: jdbc:hive2://localhost:10009/
+ThriftFrontendService: Starting and exposing JDBC connection at: jdbc:hive2://localhost:10009/
```
If something goes wrong, you shall be able to find some clues in the log file too.
@@ -172,8 +194,7 @@ bin/kyuubi run
## Using Hive Beeline
-Kyuubi server is compatible with Apache Hive beeline,
-and a builtin beeline tool can be found within the pre-built Spark package in the `$KYUUBI_HOME/externals` directory, e.g. `$KYUUBI_HOME/externals/spark-3.0.1-bin-hadoop2.7/bin/beeline`
+Kyuubi server is compatible with Apache Hive beeline, so you can use `$SPARK_HOME/bin/beeline` for testing.
### Opening a Connection
@@ -191,8 +212,8 @@ Beeline version 2.3.7 by Apache Hive
In this case, the session will create for the user named 'anonymous'.
-Kyuubi will create a Spark SQL engine application using `kyuubi-spark-sql-engine-.jar`.
-It will cost a while for the application to be ready before fully establishing the session.
+Kyuubi will create a Spark SQL engine application using `kyuubi-spark-sql-engine_2.12-.jar`.
+It will cost awhile for the application to be ready before fully establishing the session.
Otherwise, an existing application will be resued, and the time cost here is negligible.
Similarly, you can create a session for another user(or principal, subject, and maybe something else you defined), e.g. named `kentyao`,
@@ -211,32 +232,68 @@ Then, you can see 3 processes running in your local environment, including one `
72566 SparkSubmit
75356 SparkSubmit
```
+
### Execute Statements
If the beeline session is successfully connected, then you can run any query supported by Spark SQL now. For example,
```logtalk
-0: jdbc:hive2://localhost:10009/> select timestamp '2018-11-17';
-2020-11-02 20:51:49.019 INFO operation.ExecuteStatement:
- Spark application name: kyuubi_kentyao_spark_20:44:57.240
- application ID: local-1604321098626
- application web UI: http://10.242.189.214:64922
+0: jdbc:hive2://10.242.189.214:2181/> select timestamp '2018-11-17';
+2021-10-28 13:56:27.509 INFO operation.ExecuteStatement: Processing kent's query[1f619182-20ad-4733-995b-a5e43b80d998]: INITIALIZED_STATE -> PENDING_STATE, statement: select timestamp '2018-11-17'
+2021-10-28 13:56:27.547 INFO operation.ExecuteStatement: Processing kent's query[1f619182-20ad-4733-995b-a5e43b80d998]: PENDING_STATE -> RUNNING_STATE, statement: select timestamp '2018-11-17'
+2021-10-28 13:56:27.540 INFO operation.ExecuteStatement: Processing kent's query[a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: INITIALIZED_STATE -> PENDING_STATE, statement: select timestamp '2018-11-17'
+2021-10-28 13:56:27.541 INFO operation.ExecuteStatement: Processing kent's query[a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: PENDING_STATE -> RUNNING_STATE, statement: select timestamp '2018-11-17'
+2021-10-28 13:56:27.543 INFO operation.ExecuteStatement:
+ Spark application name: kyuubi_USER_kent_7ad055d0-3eca-4b78-87e8-94b22f3bade9
+ application ID: local-1635400506190
+ application web UI: http://10.242.189.214:56774
master: local[*]
deploy mode: client
- version: 3.0.1
- Start time: 2020-11-02T12:44:57.398Z
- User: kentyao
-2020-11-02 20:51:49.501 INFO codegen.CodeGenerator: Code generated in 13.673142 ms
-2020-11-02 20:51:49.625 INFO spark.SparkContext: Starting job: collect at ExecuteStatement.scala:49
-2020-11-02 20:51:50.129 INFO scheduler.DAGScheduler: Job 0 finished: collect at ExecuteStatement.scala:49, took 0.503838 s
-2020-11-02 20:51:50.151 INFO codegen.CodeGenerator: Code generated in 9.685752 ms
-2020-11-02 20:51:50.228 INFO operation.ExecuteStatement: Processing kentyao's query[d80a2664-342d-4f38-baaa-82e88e68a43b]: RUNNING_STATE -> FINISHED_STATE, statement: select timestamp '2018-11-17', time taken: 1.211 seconds
+ version: 3.2.0
+ Start time: 2021-10-28T13:55:05.528
+ User: kent
+2021-10-28 13:56:27.604 INFO operation.ExecuteStatement: Processing kent's query[a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: RUNNING_STATE -> RUNNING_STATE, statement: select timestamp '2018-11-17'
+2021-10-28 13:56:27.627 INFO codegen.CodeGenerator: Code generated in 6.696179 ms
+2021-10-28 13:56:27.635 INFO spark.SparkContext: Starting job: collect at ExecuteStatement.scala:97
+2021-10-28 13:56:27.639 INFO kyuubi.SQLOperationListener: Query [a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: Job 3 started with 1 stages, 1 active jobs running
+2021-10-28 13:56:27.639 INFO kyuubi.SQLOperationListener: Query [a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: Stage 3 started with 1 tasks, 1 active stages running
+2021-10-28 13:56:27.651 INFO scheduler.DAGScheduler: Job 3 finished: collect at ExecuteStatement.scala:97, took 0.016234 s
+2021-10-28 13:56:27.653 INFO kyuubi.SQLOperationListener: Finished stage: Stage(3, 0); Name: 'collect at ExecuteStatement.scala:97'; Status: succeeded; numTasks: 1; Took: 13 msec
+2021-10-28 13:56:27.663 INFO scheduler.StatsReportListener: task runtime:(count: 1, mean: 8.000000, stdev: 0.000000, max: 8.000000, min: 8.000000)
+2021-10-28 13:56:27.664 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.664 INFO scheduler.StatsReportListener: 8.0 ms 8.0 ms 8.0 ms 8.0 ms 8.0 ms 8.0 ms 8.0 ms 8.0 ms 8.0 ms
+2021-10-28 13:56:27.665 INFO scheduler.StatsReportListener: shuffle bytes written:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
+2021-10-28 13:56:27.665 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.665 INFO scheduler.StatsReportListener: 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B
+2021-10-28 13:56:27.666 INFO scheduler.StatsReportListener: fetch wait time:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
+2021-10-28 13:56:27.666 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.666 INFO scheduler.StatsReportListener: 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms
+2021-10-28 13:56:27.667 INFO scheduler.StatsReportListener: remote bytes read:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
+2021-10-28 13:56:27.667 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.667 INFO scheduler.StatsReportListener: 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B
+2021-10-28 13:56:27.668 INFO scheduler.StatsReportListener: task result size:(count: 1, mean: 1402.000000, stdev: 0.000000, max: 1402.000000, min: 1402.000000)
+2021-10-28 13:56:27.668 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.669 INFO scheduler.StatsReportListener: 1402.0 B 1402.0 B 1402.0 B 1402.0 B 1402.0 B 1402.0 B 1402.0 B 1402.0 B 1402.0 B
+2021-10-28 13:56:27.669 INFO codegen.CodeGenerator: Code generated in 8.815996 ms
+2021-10-28 13:56:27.672 INFO scheduler.StatsReportListener: executor (non-fetch) time pct: (count: 1, mean: 12.500000, stdev: 0.000000, max: 12.500000, min: 12.500000)
+2021-10-28 13:56:27.672 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.672 INFO scheduler.StatsReportListener: 13 % 13 % 13 % 13 % 13 % 13 % 13 % 13 % 13 %
+2021-10-28 13:56:27.673 INFO scheduler.StatsReportListener: fetch wait time pct: (count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
+2021-10-28 13:56:27.673 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.673 INFO scheduler.StatsReportListener: 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 %
+2021-10-28 13:56:27.674 INFO scheduler.StatsReportListener: other time pct: (count: 1, mean: 87.500000, stdev: 0.000000, max: 87.500000, min: 87.500000)
+2021-10-28 13:56:27.674 INFO scheduler.StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
+2021-10-28 13:56:27.674 INFO scheduler.StatsReportListener: 88 % 88 % 88 % 88 % 88 % 88 % 88 % 88 % 88 %
+2021-10-28 13:56:27.674 INFO kyuubi.SQLOperationListener: Query [a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: Job 3 succeeded, 0 active jobs running
+2021-10-28 13:56:27.744 INFO operation.ExecuteStatement: Processing kent's query[a46ca504-fe3a-4dfb-be1e-19770af8ac4c]: RUNNING_STATE -> FINISHED_STATE, statement: select timestamp '2018-11-17', time taken: 0.202 seconds
+2021-10-28 13:56:27.784 INFO operation.ExecuteStatement: Query[1f619182-20ad-4733-995b-a5e43b80d998] in FINISHED_STATE
+2021-10-28 13:56:27.784 INFO operation.ExecuteStatement: Processing kent's query[1f619182-20ad-4733-995b-a5e43b80d998]: RUNNING_STATE -> FINISHED_STATE, statement: select timestamp '2018-11-17', time taken: 0.237 seconds
+----------------------------------+
| TIMESTAMP '2018-11-17 00:00:00' |
+----------------------------------+
| 2018-11-17 00:00:00.0 |
+----------------------------------+
-1 row selected (1.466 seconds)
+1 row selected (0.404 seconds)
```
As shown in the above case, you can retrieve all the operation logs, the result schema, and the result to your client-side in the beeline console.
@@ -263,6 +320,22 @@ Stop Kyuubi by running the following in the `$KYUUBI_HOME` directory:
bin/kyuubi.sh stop
```
+And then, you will see the KyuubiServer waving goodbye to you.
+
+```logtalk
+Stopping org.apache.kyuubi.server.KyuubiServer
+ __ __ __
+ /\ \/\ \ /\ \ __
+ \ \ \/'/' __ __ __ __ __ __\ \ \____/\_\
+ \ \ , < /\ \/\ \/\ \/\ \/\ \/\ \\ \ '__`\/\ \
+ \ \ \\`\\ \ \_\ \ \ \_\ \ \ \_\ \\ \ \L\ \ \ \
+ \ \_\ \_\/`____ \ \____/\ \____/ \ \_,__/\ \_\
+ \/_/\/_/`/___/> \/___/ \/___/ \/___/ \/_/
+ /\___/
+ \/__/
+Bye!
+```
+
The `KyuubiServer` instance will be stopped immediately while the SQL engine's application will still be alive for a while.
If you start Kyuubi again before the SQL engine application terminates itself, it will reconnect to the newly created `KyuubiServer` instance.