 [](https://github.com/yaooqinn/kyuubi/pull/321)    [❨?❩](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT --> <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html --> ### _Which issue are you going to fix?_ <!-- Replace ${ID} below with the actual issue id from https://github.com/yaooqinn/kyuubi/issues, so that the issue will be linked and automatically closed after merging --> Fixes #${ID} ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the user case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Add terminology documentation for better understanding both for users and devlopers ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate  - [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request Closes #321 from yaooqinn/terminology. 2a6c7e7 [jhx1008] Add Terminologies Documentation Authored-by: jhx1008 <jhx1008@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>
154 lines
5.4 KiB
Markdown
154 lines
5.4 KiB
Markdown
<div align=center>
|
|
|
|

|
|
|
|
</div>
|
|
|
|
# Terminologies
|
|
|
|
## Kyuubi
|
|
|
|
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.
|
|
|
|
### JDBC
|
|
|
|
> The Java Database Connectivity (JDBC) API is the industry standard for database-independent connectivity between the Java programming language and a wide range of databases SQL databases and other tabular data sources,
|
|
> such as spreadsheets or flat files.
|
|
> The JDBC API provides a call-level API for SQL-based database access.
|
|
|
|
> JDBC technology allows you to use the Java programming language to exploit "Write Once, Run Anywhere" capabilities for applications that require access to enterprise data.
|
|
> With a JDBC technology-enabled driver, you can connect all corporate data even in a heterogeneous environment.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://www.oracle.com/java/technologies/javase/javase-tech-database.html">https://www.oracle.com/java/technologies/javase/javase-tech-database.html</a>
|
|
</em>
|
|
</p>
|
|
|
|
Typically, there is a gap between business development and big data analytics.
|
|
If the two are forcefully coupled, it would make the corresponding system difficult to operate and optimize.
|
|
One the flip side, if decoupled, the values of both can be maximized.
|
|
Business experts can stay focused on their own business development,
|
|
while Big Data engineers can continuously optimize server-side performance and stability.
|
|
Kyuubi combines the two seamlessly through an easy-to-use JDBC interface.
|
|
|
|
#### Apache Hive
|
|
|
|
> The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://hive.apache.org/">https://hive.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
Kyuubi supports Hive JDBC driver, which helps you seamlessly migrate your slow queries from Hive to Spark SQL.
|
|
|
|
#### Apache Thrift
|
|
|
|
> The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://thrift.apache.org/">https://thrift.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Server
|
|
|
|
Server is a daemon process that handles concurrent connection and query requests and converting these requests into various operations against the **query engines** to complete the responses to clients.
|
|
|
|
_**Aliases: Kyuubi Server / Kyuubi Instance / k.i.**_
|
|
|
|
### ServerSpace
|
|
|
|
A ServerSpace is used to register servers and expose them together as a service layer to clients.
|
|
|
|
### Engine
|
|
|
|
An engine handles all queries through Kyuubi servers.
|
|
It is created one Kyuubi server and can be shared with other Kyuubi servers by registering itself to an engine namespace.
|
|
All its capabilities are mainly powered by Spark SQL.
|
|
|
|
_**Aliases: Query Engine / Engine Instance / e.i.**_
|
|
|
|
### EngineSpace
|
|
|
|
An EngineSpace is internally used by servers to register and interact with engines.
|
|
|
|
#### Apache Spark
|
|
|
|
> [Apache Spark™](https://spark.apache.org/) is a unified analytics engine for large-scale data processing.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://spark.apache.org">https://spark.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Multi Tenancy
|
|
|
|
Kyuubi guarantees end-to-end multi-tenant isolation and sharing in the following pipeline
|
|
|
|
```
|
|
Client --> Kyuubi --> Query Engine(Spark) --> Resource Manager --> Data Storage Layer
|
|
```
|
|
|
|
### High Availability / Load Balance
|
|
|
|
As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High Availability (HA) mode helps you guarantee that.
|
|
|
|
#### Apache Zookeeper
|
|
|
|
> Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://zookeeper.apache.org/">https://zookeeper.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
#### Apache Curator
|
|
|
|
> Apache Curator is a Java/JVM client library for Apache ZooKeeper, a distributed coordination service. It includes a highlevel API framework and utilities to make using Apache ZooKeeper much easier and more reliable. It also includes recipes for common use cases and extensions such as service discovery and a Java 8 asynchronous DSL.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://curator.apache.org/">https://curator.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
## DataLake & LakeHouse
|
|
|
|
Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
|
|
|
|
### Apache Iceberg
|
|
|
|
> Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="http://iceberg.apache.org/">http://iceberg.apache.org/</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Delta Lake
|
|
|
|
> Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://delta.io/">https://delta.io</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Apache Hudi
|
|
|
|
> Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores).
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://hudi.apache.org/">https://hudi.apache.org</a>
|
|
</em>
|
|
</p>
|