Add doc for overview

This commit is contained in:
Kent Yao 2020-11-13 14:17:42 +08:00
parent fb4bace6a5
commit 4396e59abe
9 changed files with 108 additions and 18 deletions

View File

@ -19,7 +19,7 @@
#
# kyuubi.authentication NONE
# kyuubi.frontend.bind.port 10009
#
## Spark Configurations
#
# spark.master local
@ -28,4 +28,4 @@
## Hadoop Configurations
#
# kyuubi.hadoop.authentication KERBEROS
#
#

View File

@ -6,7 +6,7 @@ Deploying Kyuubi
.. toctree::
:maxdepth: 2
:numbered: 3
:numbered: 4
settings
on_yarn

View File

@ -65,9 +65,13 @@ the QUEUE configured at Kyuubi server side will be used as default.
#### Sizing
Pass the configurations below through the JDBC connection string to set how many instances of Spark executor will be used
and how many cpus and memory will Spark driver, ApplicationMaster and each executor take.
- | Default | Meaning
--- | --- | ---
spark.executor.instances | 1 | The number of executors for static allocation
spark.executor.cores | 1 | The number of cores to use on each executor
spark.yarn.am.memory | 512m | Amount of memory to use for the YARN Application Master in client mode
spark.yarn.am.memoryOverhead | amMemory * 0.10, with minimum of 384 | Amount of non-heap memory to be allocated per am process in client mode
spark.driver.memory | 1g | Amount of memory to use for the driver process
@ -75,15 +79,26 @@ spark.driver.memoryOverhead | driverMemory * 0.10, with minimum of 384 | Amount
spark.executor.memory | 1g | Amount of memory to use for the executor process
spark.executor.memoryOverhead | executorMemory * 0.10, with minimum of 384 | Amount of additional memory to be allocated per executor process. This is memory that accounts for things like VM overheads, interned strings other native overheads, etc
It is recommended to use [Dynamic Allocation](http://spark.apache.org/docs/3.0.1/configuration.html#dynamic-allocation) with Kyuubi,
since the SQL engine will be long-running for a period, execute user's queries from clients aperiodically,
and the demand for computing resources is not the same for those queries.
It is better for Spark to release some executors when either the query is lightweight, or the SQL engine is being idled.
#### Tuning
You can specify `spark.yarn.archive` or `spark.yarn.jars` to point to a world-readable location that contains Spark jars on HDFS,
which allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs.
####
#### Others
Acceptable [Spark properties](http://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties)
Please refer to [Spark properties](http://spark.apache.org/docs/latest/running-on-yarn.html#spark-properties) to check other acceptable configs.
## Kerberos
Kyuubi currently does not support Spark's [YARN-specific Kerberos Configuration](http://spark.apache.org/docs/3.0.1/running-on-yarn.html#kerberos),
so `spark.kerberos.keytab` and `spark.kerberos.principal` should not use now.
Instead, you can schedule a periodically `kinit` process via `crontab` task on the local machine that hosts Kyuubi server or simply use [Kyuubi Kinit](settings.html#kinit)

View File

@ -101,11 +101,9 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
Key | Default | Meaning | Since
--- | --- | --- | ---
kyuubi\.authentication|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>NONE</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>Client authentication types.<ul> <li>NONE: no authentication check.</li> <li>KERBEROS: Kerberos/GSSAPI authentication.</li> <li>LDAP: Lightweight Directory Access Protocol authentication.</li></ul></div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.authentication<br>\.keytab|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>Location of Kyuubi server's keytab.</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.authentication<br>\.ldap\.base\.dn|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>LDAP base DN.</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.authentication<br>\.ldap\.domain|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>LDAP base DN.</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.authentication<br>\.ldap\.url|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>SPACE character separated LDAP connection URL(s).</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.authentication<br>\.principal|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>Name of the Kerberos principal.</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.authentication<br>\.sasl\.qop|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>auth</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>Sasl QOP enable higher levels of protection for Kyuubi communication with clients.<ul> <li>auth - authentication only (default)</li> <li>auth-int - authentication plus integrity protection</li> <li>auth-conf - authentication plus integrity and confidentiality protection. This is applicable only if Kyuubi is configured to use Kerberos authentication.</li> </ul></div>|<div style='width: 20pt'>1.0.0</div>
### Delegation
@ -149,7 +147,9 @@ kyuubi\.ha\.zookeeper<br>\.session\.timeout|<div style='width: 80pt;word-wrap: b
Key | Default | Meaning | Since
--- | --- | --- | ---
kyuubi\.kinit\.interval|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>PT1H</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>How often will Kyuubi server run `kinit -kt [keytab] [principal]` to renew the local Kerberos credentials cache</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.kinit\.keytab|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>Location of Kyuubi server's keytab.</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.kinit\.max<br>\.attempts|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>10</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>How many times will `kinit` process retry</div>|<div style='width: 20pt'>1.0.0</div>
kyuubi\.kinit<br>\.principal|<div style='width: 80pt;word-wrap: break-word;white-space: normal'>&lt;undefined&gt;</div>|<div style='width: 200pt;word-wrap: break-word;white-space: normal'>Name of the Kerberos principal.</div>|<div style='width: 20pt'>1.0.0</div>
### Operation

View File

@ -10,7 +10,6 @@
Welcome to Kyuubi's documentation!
==================================
.. toctree::
:maxdepth: 2
:glob:

View File

@ -1,12 +1,12 @@
.. image:: ../imgs/kyuubi_logo.png
.. image:: ../imgs/kyuubi.png
:align: center
Overview
===========
========
.. toctree::
:maxdepth: 2
:numbered: 2
summary
kyuubi_vs_hive

View File

@ -1,3 +1,65 @@
# What is Kyuubi
# Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing. Currently, kyuubi use Apache Spark as SQL engine.
Kyuubi™ is a unified multi-tenant JDBC interface for large-scale data processing, built on top of [Apache Spark™](http://spark.apache.org/).
![](../imgs/kyuubi_layers.png)
In general, the complete ecosystem of Kyuubi falls into the hierarchies shown in the above figure, with each layer loosely coupled to the other.
For example,
You can use Kyuubi, Spark and [Apache Iceberg](https://iceberg.apache.org/) to build and manage Data Lake with pure SQL.
Kyuubi provides the following features:
## Multi-tenancy
Kyuubi supports the end-to-end multi-tenancy,
and this is why we want to create this project despite that the Spark [Thrift JDBC/ODBC server](http://spark.apache.org/docs/latest/sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server) already exists.
1. Supports multi-client concurrency and authentication
2. Supports one Spark application per account(SPA).
3. Supports QUEUE/NAMESPACE Access Control Lists (ACL)
4. Supports metadata & data Access Control Lists
Users who have valid accounts could use all kinds of client tools, e.g.
Hive Beeline, [HUE](https://gethue.com/), [DBeaver](https://dbeaver.io/),
[SQuirreL SQL Client](http://squirrel-sql.sourceforge.net/), etc,
to operate with Kyuubi server concurrently.
The SPA policy makes sure 1) a user account can only get computing resource with managed ACLs, e.g.
[Queue Access Control Lists](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Queue_Access_Control_Lists),
from cluster managers, e.g.
[Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html),
[Kubernetes (K8s)](https://kubernetes.io/) to create the Spark application;
2) a user account can only access data and metadata from a storage system, e.g.
[Apache Hadoop HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html),
with permissions.
## Ease of Use
You only need to be familiar with Structured Query Language (SQL) and Java Database Connectivity (JDBC) to handle massive data.
It helps you focus on the design and implementation of your business system.
- SQL is the standard language for accessing relational databases, and very popular in big data eco too.
It turns out that everybody knows SQL.
- JDBC provides a standard API for tool/database developers and makes it possible to write database applications using a pure Java API.
- There are plenty of free or commercial JDBC tools out there.
## Run Anywhere
Kyuubi can submit Spark applications to all supported cluster managers, including YARN, Mesos, Kubernetes, Standalone, and local.
The SPA policy also make it possible for you to launch different applications against different cluster managers.
## High Performance
Kyuubi is built on the Apache Spark, a lightning-fast unified analytics engine.
- **Concurrent execution**: multiple Spark applications work together
- **Quick response**: long-running Spark applications without startup
- **Optimal execution plan**: fully supports Spark SQL Catalyst Optimizer,
## Authentication & Authorization
## High Availability

View File

@ -0,0 +1,14 @@
<div align=center>
![](../imgs/kyuubi_logo_simple.png)
</div>
# Getting Started With Cloudera Hue
docker run -it -p 8888:8888 gethue/hue:latest
http://localhost:8888/
![](../imgs/hue_login.png)

View File

@ -165,13 +165,13 @@ object KyuubiConf {
.stringConf
.createWithDefault("embedded_zookeeper")
val SERVER_PRINCIPAL: OptionalConfigEntry[String] = buildConf("authentication.principal")
val SERVER_PRINCIPAL: OptionalConfigEntry[String] = buildConf("kinit.principal")
.doc("Name of the Kerberos principal.")
.version("1.0.0")
.stringConf
.createOptional
val SERVER_KEYTAB: OptionalConfigEntry[String] = buildConf("authentication.keytab")
val SERVER_KEYTAB: OptionalConfigEntry[String] = buildConf("kinit.keytab")
.doc("Location of Kyuubi server's keytab.")
.version("1.0.0")
.stringConf