[KYUUBI #321] Add Terminologies Documentation
 [](https://github.com/yaooqinn/kyuubi/pull/321)    [❨?❩](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT --> <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html --> ### _Which issue are you going to fix?_ <!-- Replace ${ID} below with the actual issue id from https://github.com/yaooqinn/kyuubi/issues, so that the issue will be linked and automatically closed after merging --> Fixes #${ID} ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the user case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Add terminology documentation for better understanding both for users and devlopers ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate  - [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request Closes #321 from yaooqinn/terminology. 2a6c7e7 [jhx1008] Add Terminologies Documentation Authored-by: jhx1008 <jhx1008@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>
This commit is contained in:
parent
af58331f67
commit
97c0ff8395
@ -28,7 +28,7 @@ It embraces Spark and builds an ecosystem on top of it,
|
||||
which allows Kyuubi to quickly expand its existing ecosystem and introduce new features,
|
||||
such as cloud-native support and `Data Lake/Lake House` support.
|
||||
|
||||
Kyuubi's vision is to build on top of Apache Spark and Data Lake technologies to unify the portal and become an ideal data lake management platform.
|
||||
The vision of Kyuubi vision is to build on top of Apache Spark and Data Lake technologies to unify the portal and become an ideal data lake management platform.
|
||||
It can support data processing e.g. ETL, and analytics e.g. BI in a pure SQL way.
|
||||
All workloads can be done on one platform, using one copy of data, with one SQL interface.
|
||||
|
||||
|
||||
12
docs/appendix/index.rst
Normal file
12
docs/appendix/index.rst
Normal file
@ -0,0 +1,12 @@
|
||||
.. image:: ../imgs/kyuubi_logo.png
|
||||
:align: center
|
||||
|
||||
Appendixes
|
||||
==========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
:numbered: 4
|
||||
|
||||
|
||||
terminology
|
||||
153
docs/appendix/terminology.md
Normal file
153
docs/appendix/terminology.md
Normal file
@ -0,0 +1,153 @@
|
||||
<div align=center>
|
||||
|
||||

|
||||
|
||||
</div>
|
||||
|
||||
# Terminologies
|
||||
|
||||
## Kyuubi
|
||||
|
||||
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.
|
||||
|
||||
### JDBC
|
||||
|
||||
> The Java Database Connectivity (JDBC) API is the industry standard for database-independent connectivity between the Java programming language and a wide range of databases SQL databases and other tabular data sources,
|
||||
> such as spreadsheets or flat files.
|
||||
> The JDBC API provides a call-level API for SQL-based database access.
|
||||
|
||||
> JDBC technology allows you to use the Java programming language to exploit "Write Once, Run Anywhere" capabilities for applications that require access to enterprise data.
|
||||
> With a JDBC technology-enabled driver, you can connect all corporate data even in a heterogeneous environment.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://www.oracle.com/java/technologies/javase/javase-tech-database.html">https://www.oracle.com/java/technologies/javase/javase-tech-database.html</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
Typically, there is a gap between business development and big data analytics.
|
||||
If the two are forcefully coupled, it would make the corresponding system difficult to operate and optimize.
|
||||
One the flip side, if decoupled, the values of both can be maximized.
|
||||
Business experts can stay focused on their own business development,
|
||||
while Big Data engineers can continuously optimize server-side performance and stability.
|
||||
Kyuubi combines the two seamlessly through an easy-to-use JDBC interface.
|
||||
|
||||
#### Apache Hive
|
||||
|
||||
> The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://hive.apache.org/">https://hive.apache.org</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
Kyuubi supports Hive JDBC driver, which helps you seamlessly migrate your slow queries from Hive to Spark SQL.
|
||||
|
||||
#### Apache Thrift
|
||||
|
||||
> The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://thrift.apache.org/">https://thrift.apache.org</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
### Server
|
||||
|
||||
Server is a daemon process that handles concurrent connection and query requests and converting these requests into various operations against the **query engines** to complete the responses to clients.
|
||||
|
||||
_**Aliases: Kyuubi Server / Kyuubi Instance / k.i.**_
|
||||
|
||||
### ServerSpace
|
||||
|
||||
A ServerSpace is used to register servers and expose them together as a service layer to clients.
|
||||
|
||||
### Engine
|
||||
|
||||
An engine handles all queries through Kyuubi servers.
|
||||
It is created one Kyuubi server and can be shared with other Kyuubi servers by registering itself to an engine namespace.
|
||||
All its capabilities are mainly powered by Spark SQL.
|
||||
|
||||
_**Aliases: Query Engine / Engine Instance / e.i.**_
|
||||
|
||||
### EngineSpace
|
||||
|
||||
An EngineSpace is internally used by servers to register and interact with engines.
|
||||
|
||||
#### Apache Spark
|
||||
|
||||
> [Apache Spark™](https://spark.apache.org/) is a unified analytics engine for large-scale data processing.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://spark.apache.org">https://spark.apache.org</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
### Multi Tenancy
|
||||
|
||||
Kyuubi guarantees end-to-end multi-tenant isolation and sharing in the following pipeline
|
||||
|
||||
```
|
||||
Client --> Kyuubi --> Query Engine(Spark) --> Resource Manager --> Data Storage Layer
|
||||
```
|
||||
|
||||
### High Availability / Load Balance
|
||||
|
||||
As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High Availability (HA) mode helps you guarantee that.
|
||||
|
||||
#### Apache Zookeeper
|
||||
|
||||
> Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://zookeeper.apache.org/">https://zookeeper.apache.org</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
#### Apache Curator
|
||||
|
||||
> Apache Curator is a Java/JVM client library for Apache ZooKeeper, a distributed coordination service. It includes a highlevel API framework and utilities to make using Apache ZooKeeper much easier and more reliable. It also includes recipes for common use cases and extensions such as service discovery and a Java 8 asynchronous DSL.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://curator.apache.org/">https://curator.apache.org</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
## DataLake & LakeHouse
|
||||
|
||||
Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
|
||||
|
||||
### Apache Iceberg
|
||||
|
||||
> Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="http://iceberg.apache.org/">http://iceberg.apache.org/</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
### Delta Lake
|
||||
|
||||
> Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://delta.io/">https://delta.io</a>
|
||||
</em>
|
||||
</p>
|
||||
|
||||
### Apache Hudi
|
||||
|
||||
> Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores).
|
||||
|
||||
<p align=right>
|
||||
<em>
|
||||
<a href="https://hudi.apache.org/">https://hudi.apache.org</a>
|
||||
</em>
|
||||
</p>
|
||||
@ -102,3 +102,9 @@ Kyuubi provides both high availability and load balancing solutions based on Zoo
|
||||
|
||||
tools/index
|
||||
community/index
|
||||
|
||||
.. toctree::
|
||||
:caption: Appendix
|
||||
:maxdepth: 2
|
||||
|
||||
appendix/index
|
||||
|
||||
Loading…
Reference in New Issue
Block a user