[KYUUBI #321] Add Terminologies Documentation

![yaooqinn](https://badgen.net/badge/Hello/yaooqinn/green) [![Closes%20#321](https://badgen.net/badge/Preview/Closes%2520%23321/blue)](https://github.com/yaooqinn/kyuubi/pull/321) ![172](https://badgen.net/badge/%2B/172/red) ![1](https://badgen.net/badge/-/1/green) ![1](https://badgen.net/badge/commits/1/yellow) [&#10088;?&#10089;](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT -->

<!--
Thanks for sending a pull request!

Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines:
     https://kyuubi.readthedocs.io/en/latest/community/contributions.html
-->

### _Which issue are you going to fix?_
<!--
Replace ${ID} below with the actual issue id from
https://github.com/yaooqinn/kyuubi/issues,
so that the issue will be linked and automatically closed after merging
-->

Fixes #${ID}

### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
  1. If you add a feature, you can talk about the user case of it.
  2. If you fix a bug, you can clarify why it is a bug.
-->

Add terminology documentation for better understanding both for users and devlopers

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate
![image](https://user-images.githubusercontent.com/8326978/106347018-278d9880-62f6-11eb-8dcf-8125ce69c403.png)

- [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request

Closes #321 from yaooqinn/terminology.

2a6c7e7 [jhx1008] Add Terminologies Documentation

Authored-by: jhx1008 <jhx1008@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
This commit is contained in:
jhx1008 2021-01-30 12:28:42 +08:00 committed by Kent Yao
parent af58331f67
commit 97c0ff8395
No known key found for this signature in database
GPG Key ID: F7051850A0AF904D
4 changed files with 172 additions and 1 deletions

View File

@ -28,7 +28,7 @@ It embraces Spark and builds an ecosystem on top of it,
which allows Kyuubi to quickly expand its existing ecosystem and introduce new features,
such as cloud-native support and `Data Lake/Lake House` support.
Kyuubi's vision is to build on top of Apache Spark and Data Lake technologies to unify the portal and become an ideal data lake management platform.
The vision of Kyuubi vision is to build on top of Apache Spark and Data Lake technologies to unify the portal and become an ideal data lake management platform.
It can support data processing e.g. ETL, and analytics e.g. BI in a pure SQL way.
All workloads can be done on one platform, using one copy of data, with one SQL interface.

12
docs/appendix/index.rst Normal file
View File

@ -0,0 +1,12 @@
.. image:: ../imgs/kyuubi_logo.png
:align: center
Appendixes
==========
.. toctree::
:maxdepth: 3
:numbered: 4
terminology

View File

@ -0,0 +1,153 @@
<div align=center>
![](../imgs/kyuubi_logo_simple.png)
</div>
# Terminologies
## Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.
### JDBC
> The Java Database Connectivity (JDBC) API is the industry standard for database-independent connectivity between the Java programming language and a wide range of databases SQL databases and other tabular data sources,
> such as spreadsheets or flat files.
> The JDBC API provides a call-level API for SQL-based database access.
> JDBC technology allows you to use the Java programming language to exploit "Write Once, Run Anywhere" capabilities for applications that require access to enterprise data.
> With a JDBC technology-enabled driver, you can connect all corporate data even in a heterogeneous environment.
<p align=right>
<em>
<a href="https://www.oracle.com/java/technologies/javase/javase-tech-database.html">https://www.oracle.com/java/technologies/javase/javase-tech-database.html</a>
</em>
</p>
Typically, there is a gap between business development and big data analytics.
If the two are forcefully coupled, it would make the corresponding system difficult to operate and optimize.
One the flip side, if decoupled, the values of both can be maximized.
Business experts can stay focused on their own business development,
while Big Data engineers can continuously optimize server-side performance and stability.
Kyuubi combines the two seamlessly through an easy-to-use JDBC interface.
#### Apache Hive
> The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
<p align=right>
<em>
<a href="https://hive.apache.org/">https://hive.apache.org</a>
</em>
</p>
Kyuubi supports Hive JDBC driver, which helps you seamlessly migrate your slow queries from Hive to Spark SQL.
#### Apache Thrift
> The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
<p align=right>
<em>
<a href="https://thrift.apache.org/">https://thrift.apache.org</a>
</em>
</p>
### Server
Server is a daemon process that handles concurrent connection and query requests and converting these requests into various operations against the **query engines** to complete the responses to clients.
_**Aliases: Kyuubi Server / Kyuubi Instance / k.i.**_
### ServerSpace
A ServerSpace is used to register servers and expose them together as a service layer to clients.
### Engine
An engine handles all queries through Kyuubi servers.
It is created one Kyuubi server and can be shared with other Kyuubi servers by registering itself to an engine namespace.
All its capabilities are mainly powered by Spark SQL.
_**Aliases: Query Engine / Engine Instance / e.i.**_
### EngineSpace
An EngineSpace is internally used by servers to register and interact with engines.
#### Apache Spark
> [Apache Spark™](https://spark.apache.org/) is a unified analytics engine for large-scale data processing.
<p align=right>
<em>
<a href="https://spark.apache.org">https://spark.apache.org</a>
</em>
</p>
### Multi Tenancy
Kyuubi guarantees end-to-end multi-tenant isolation and sharing in the following pipeline
```
Client --> Kyuubi --> Query Engine(Spark) --> Resource Manager --> Data Storage Layer
```
### High Availability / Load Balance
As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High Availability (HA) mode helps you guarantee that.
#### Apache Zookeeper
> Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.
<p align=right>
<em>
<a href="https://zookeeper.apache.org/">https://zookeeper.apache.org</a>
</em>
</p>
#### Apache Curator
> Apache Curator is a Java/JVM client library for Apache ZooKeeper, a distributed coordination service. It includes a highlevel API framework and utilities to make using Apache ZooKeeper much easier and more reliable. It also includes recipes for common use cases and extensions such as service discovery and a Java 8 asynchronous DSL.
<p align=right>
<em>
<a href="https://curator.apache.org/">https://curator.apache.org</a>
</em>
</p>
## DataLake & LakeHouse
Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
### Apache Iceberg
> Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
<p align=right>
<em>
<a href="http://iceberg.apache.org/">http://iceberg.apache.org/</a>
</em>
</p>
### Delta Lake
> Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
<p align=right>
<em>
<a href="https://delta.io/">https://delta.io</a>
</em>
</p>
### Apache Hudi
> Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores).
<p align=right>
<em>
<a href="https://hudi.apache.org/">https://hudi.apache.org</a>
</em>
</p>

View File

@ -102,3 +102,9 @@ Kyuubi provides both high availability and load balancing solutions based on Zoo
tools/index
community/index
.. toctree::
:caption: Appendix
:maxdepth: 2
appendix/index