[KYUUBI #1866][DOCS] Add flink sql engine quick start

### _Why are the changes needed?_

Add quick start documents of the Flink SQL Engine.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #2106 from deadwind4/KYUUBI-1866-quickstart.

Closes #1866

2533aafd [Ada Wong] remove Yarn section
6aa4db8a [Ada Wong] compress png
ff6bff72 [Ada Wong] [KYUUBI #1866][DOCS] Add flink sql engine quick start

Authored-by: Ada Wong <rsl4@foxmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
This commit is contained in:
Ada Wong 2022-03-12 17:07:35 +08:00 committed by Kent Yao
parent b7a5cfcf78
commit 8f7b2c6640
No known key found for this signature in database
GPG Key ID: F7051850A0AF904D
2 changed files with 140 additions and 19 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

View File

@ -36,49 +36,51 @@ You can get the most recent stable release of Apache Kyuubi here:
## Requirements
These are essential components required for Kyuubi to startup.
For quick start deployment, the only thing you need is `JAVA_HOME` and `SPARK_HOME` being correctly set.
For quick start deployment, the only thing you need is `JAVA_HOME` being correctly set.
The Kyuubi release package you downloaded or built contains the rest prerequisites inside already.
Components| Role | Optional | Version | Remarks
--- | --- | --- | --- | ---
Java | Java<br>Runtime<br>Environment | Required | Java 8/11 | Kyuubi is pre-built with Java 8
Spark | Distributed<br>SQL<br>Engine | Required | 3.0.0 and above | By default Kyuubi binary release is delivered without<br> a Spark tarball.
Spark | Distributed<br>SQL<br>Engine | Optional | 3.0.0 and above | By default Kyuubi binary release is delivered without<br> a Spark tarball.
Flink | Distributed<br>SQL<br>Engine | Optional | 1.14.0 and above | By default Kyuubi binary release is delivered without<br> a Flink tarball.
HDFS | Distributed<br>File<br>System | Optional | referenced<br>by<br>Spark | Hadoop Distributed File System is a <br>part of Hadoop framework, used to<br> store and process the datasets.<br> You can interact with any<br> Spark-compatible versions of HDFS.
Hive | Metastore | Optional | referenced<br>by<br>Spark | Hive Metastore for Spark SQL to connect
Zookeeper | Service<br>Discovery | Optional | Any<br>zookeeper<br>ensemble<br>compatible<br>with<br>curator(2.12.0) | By default, Kyuubi provides a<br> embedded Zookeeper server inside for<br> non-production use.
Additionally, if you want to work with other Spark compatible systems or plugins, you only need to take care of them as using them with regular Spark applications.
For example, you can run Spark SQL engines created by the Kyuubi on any cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
Or, you can manipulate data from different data sources with the Spark Datasource API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and e.t.c...
Additionally, if you want to work with other Spark/Flink compatible systems or plugins, you only need to take care of them as using them with regular Spark/Flink applications.
For example, you can run Spark/Flink SQL engines created by the Kyuubi on any cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
Or, you can manipulate data from different data sources with the Spark Datasource/Flink Table API, e.g. Delta Lake, Apache Hudi, Apache Iceberg, Apache Kudu and e.t.c...
## Installation
To install Kyuubi, you need to unpack the tarball. For example,
```bash
tar zxf apache-kyuubi-1.3.1-incubating-bin.tgz
tar zxf apache-kyuubi-1.5.0-incubating-bin.tgz
```
This will result in the creation of a subdirectory named `apache-kyuubi-1.3.1-incubating-bin` shown below,
This will result in the creation of a subdirectory named `apache-kyuubi-1.5.0-incubating-bin` shown below,
```bash
apache-kyuubi-1.3.1-incubating-bin
apache-kyuubi-1.5.0-incubating-bin
├── DISCLAIMER
├── LICENSE
├── NOTICE
├── RELEASE
├── beeline-jars
├── bin
├── conf
| ├── kyuubi-defaults.conf.template
│ ├── kyuubi-env.sh.template
│ └── log4j.properties.template
│ └── log4j2.properties.template
├── docker
│ ├── Dockerfile
│ ├── helm
│ ├── kyuubi-configmap.yaml
│ ├── kyuubi-deployment.yaml
│ ├── kyuubi-pod.yaml
│ └── kyuubi-service.yaml
├── extension
│ └── kyuubi-extension-spark-3-1_2.12-1.3.1-incubating.jar
├── externals
│ └── engines
├── jars
@ -97,7 +99,7 @@ From top to bottom are:
- bin: the entry of the Kyuubi server with `kyuubi` as the startup script.
- conf: all the defaults used by Kyuubi Server itself or creating a session with Spark applications.
- externals
- engines: contains all kinds of SQL engines that we support, e.g. Apache Spark, Apache Flink(coming soon).
- engines: contains all kinds of SQL engines that we support, e.g. Apache Spark, Apache Flink, Trino(coming soon).
- licenses: a bunch of licenses included.
- jars: packages needed by the Kyuubi server.
- logs: where the logs of the Kyuubi server locates.
@ -106,7 +108,11 @@ From top to bottom are:
## Running Kyuubi
As mentioned above, for a quick start deployment, then only you need to be sure is that your java runtime environment and `SPARK_HOME` are correct.
As mentioned above, for a quick start deployment, then only you need to be sure is that the below environments are correct:
- Java runtime environment
- `SPARK_HOME` for the Spark engine
- `FLINK_HOME` and `kyuubi.engine.type` in `$KYUUBI_HOME/conf/kyuubi-defaults.conf` for the Flink engine.
### Setup JAVA
@ -132,7 +138,9 @@ Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode)
The recommended place to set `JAVA_HOME` is `$KYUUBI_HOME/conf/kyuubi-env.sh`, as the ways above are too flaky.
The `JAVA_HOME` in `$KYUUBI_HOME/conf/kyuubi-env.sh` will take others' precedence.
### Setup Spark
### Spark Engine
#### Setup Spark
Similar to `JAVA_HOME`, you can also set `SPARK_HOME` in different ways. However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
@ -142,6 +150,26 @@ For example,
SPARK_HOME=~/Downloads/spark-3.2.0-bin-hadoop3.2
```
### Flink Engine
#### Setup Flink
Similar to `JAVA_HOME`, you can also set `FLINK_HOME` in different ways. However, we recommend setting it in `$KYUUBI_HOME/conf/kyuubi-env.sh` too.
For example,
```bash
FLINK_HOME=/Downloads/flink-1.14.3
```
#### Setup Kyuubi Flink Configration
To enable the Flink SQL engine, the `kyuubi.engine.type` in `$KYUUBI_HOME/conf/kyuubi-defaults.conf` need to be set as `FLINK_SQL`.
```bash
kyuubi.engine.type FLINK_SQL
```
### Starting Kyuubi
```bash
@ -195,7 +223,7 @@ bin/kyuubi run
## Using Hive Beeline
Kyuubi server is compatible with Apache Hive beeline, so you can use `$SPARK_HOME/bin/beeline` for testing.
Kyuubi server is compatible with Apache Hive beeline, so you can use `$KYUUBI_HOME/bin/beeline` for testing.
### Opening a Connection
@ -213,7 +241,7 @@ Beeline version 2.3.7 by Apache Hive
In this case, the session will create for the user named 'anonymous'.
Kyuubi will create a Spark SQL engine application using `kyuubi-spark-sql-engine_2.12-<version>.jar`.
Kyuubi will create a Spark/Flink SQL engine application using `kyuubi-<engine>-sql-engine_2.12-<version>.jar`.
It will cost awhile for the application to be ready before fully establishing the session.
Otherwise, an existing application will be reused, and the time cost here is negligible.
@ -225,17 +253,28 @@ bin/beeline -u 'jdbc:hive2://localhost:10009/' -n kentyao
The formerly created Spark application for user 'anonymous' will not be reused in this case, while a brand new application will be submitted for user 'kentyao' instead.
Then, you can see 3 processes running in your local environment, including one `KyuubiServer` instance and 2 `SparkSubmit` instances as the SQL engines.
Then, you can see two processes running in your local environment, including one `KyuubiServer` instance, one `SparkSubmit` or `FlinkSQLEngine` instances as the SQL engines.
- Spark
```
75730 Jps
70843 KyuubiServer
72566 SparkSubmit
75356 SparkSubmit
```
- Flink
```
43484 Jps
43194 KyuubiServer
43260 FlinkSQLEngine
```
### Execute Statements
#### Execute Spark SQL Statements
If the beeline session is successfully connected, then you can run any query supported by Spark SQL now. For example,
```logtalk
@ -304,6 +343,88 @@ For example, you can get the Spark web UI from the log for debugging or tuning.
![](../imgs/spark_jobs_page.png)
#### Execute Flink SQL Statements
If the beeline session is successfully connected, then you can run any query supported by Flink SQL now. For example,
```logtalk
0: jdbc:hive2://127.0.0.1:10009/default> CREATE TABLE T (
. . . . . . . . . . . . . . . . . . . . . . > a INT,
. . . . . . . . . . . . . . . . . . . . . . > b VARCHAR(10)
. . . . . . . . . . . . . . . . . . . . . . > ) WITH (
. . . . . . . . . . . . . . . . . . . . . . > 'connector.type' = 'filesystem',
. . . . . . . . . . . . . . . . . . . . . . > 'connector.path' = 'file:///tmp/T.csv',
. . . . . . . . . . . . . . . . . . . . . . > 'format.type' = 'csv',
. . . . . . . . . . . . . . . . . . . . . . > 'format.derive-schema' = 'true'
. . . . . . . . . . . . . . . . . . . . . . > );
16:28:47.164 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: INITIALIZED_STATE -> PENDING_STATE, statement: CREATE TABLE T(
a INT,
b VARCHAR(10)
) WITH (
'connector.type' = 'filesystem',
'connector.path' = 'file:///tmp/T.csv',
'format.type' = 'csv',
'format.derive-schema' = 'true'
)
16:28:47.187 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: PENDING_STATE -> RUNNING_STATE, statement: CREATE TABLE T(
a INT,
b VARCHAR(10)
) WITH (
'connector.type' = 'filesystem',
'connector.path' = 'file:///tmp/T.csv',
'format.type' = 'csv',
'format.derive-schema' = 'true'
)
16:28:47.320 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f] in FINISHED_STATE
16:28:47.322 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[22a73e39-d9d7-479b-a118-33f9d2a5ad3f]: RUNNING_STATE -> FINISHED_STATE, statement: CREATE TABLE T(
a INT,
b VARCHAR(10)
) WITH (
'connector.type' = 'filesystem',
'connector.path' = 'file:///tmp/T.csv',
'format.type' = 'csv',
'format.derive-schema' = 'true'
), time taken: 0.134 seconds
+---------+
| result |
+---------+
| OK |
+---------+
1 row selected (0.341 seconds)
0: jdbc:hive2://127.0.0.1:10009/default> INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello');
16:28:52.780 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: INITIALIZED_STATE -> PENDING_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello')
16:28:52.786 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: PENDING_STATE -> RUNNING_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello')
16:28:57.827 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[d79abf78-d2ae-468f-87b2-19db1fc6e19a] in RUNNING_STATE
16:28:59.836 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[d79abf78-d2ae-468f-87b2-19db1fc6e19a] in FINISHED_STATE
16:28:59.837 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[d79abf78-d2ae-468f-87b2-19db1fc6e19a]: RUNNING_STATE -> FINISHED_STATE, statement: INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello'), time taken: 7.05 seconds
+-------------------------------------+
| default_catalog.default_database.T |
+-------------------------------------+
| -1 |
+-------------------------------------+
1 row selected (7.104 seconds)
0: jdbc:hive2://127.0.0.1:10009/default>
0: jdbc:hive2://127.0.0.1:10009/default> SELECT * FROM T;
16:29:08.092 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: INITIALIZED_STATE -> PENDING_STATE, statement: SELECT * FROM T
16:29:08.101 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: PENDING_STATE -> RUNNING_STATE, statement: SELECT * FROM T
16:29:12.519 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f] in FINISHED_STATE
16:29:12.520 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[af5660c0-fcc4-4f80-b3fd-c4a799faf33f]: RUNNING_STATE -> FINISHED_STATE, statement: SELECT * FROM T, time taken: 4.419 seconds
+----+--------+
| a | b |
+----+--------+
| 1 | Hi |
| 2 | Hello |
+----+--------+
2 rows selected (4.466 seconds)
```
As shown in the above case, you can retrieve all the operation logs, the result schema, and the result to your client-side in the beeline console.
Additionally, some useful information about the background Flink SQL application associated with this connection is also printed in the operation log.
For example, you can get the Flink web UI from the log for debugging or tuning.
![](../imgs/flink/flink_jobs_page.png)
### Closing a Connection
Close the session between beeline and Kyuubi server by executing `!quit`, for example,
@ -339,4 +460,4 @@ Bye!
The `KyuubiServer` instance will be stopped immediately while the SQL engine's application will still be alive for a while.
If you start Kyuubi again before the SQL engine application terminates itself, it will reconnect to the newly created `KyuubiServer` instance.
If you start Kyuubi again before the SQL engine application terminates itself, it will reconnect to the newly created `KyuubiServer` instance.