<!--
Thanks for sending a pull request!
Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->
### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
1. If you add a feature, you can talk about the use case of it.
2. If you fix a bug, you can clarify why it is a bug.
-->
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes #951 from pan3793/license.
Closes #951
4629eecd [Cheng Pan] Fix
c45a0784 [Cheng Pan] nit
b9a46b42 [Cheng Pan] pin license header at first line
80d1a71b [Cheng Pan] nit
b2a46e4c [Cheng Pan] Update
f6acaaf8 [Cheng Pan] minor
ef99183f [Cheng Pan] Add license header on all docs
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit a76c344042)
Signed-off-by: Cheng Pan <chengpan@apache.org>
171 lines
6.2 KiB
Markdown
171 lines
6.2 KiB
Markdown
<!--
|
|
- Licensed to the Apache Software Foundation (ASF) under one or more
|
|
- contributor license agreements. See the NOTICE file distributed with
|
|
- this work for additional information regarding copyright ownership.
|
|
- The ASF licenses this file to You under the Apache License, Version 2.0
|
|
- (the "License"); you may not use this file except in compliance with
|
|
- the License. You may obtain a copy of the License at
|
|
-
|
|
- http://www.apache.org/licenses/LICENSE-2.0
|
|
-
|
|
- Unless required by applicable law or agreed to in writing, software
|
|
- distributed under the License is distributed on an "AS IS" BASIS,
|
|
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
- See the License for the specific language governing permissions and
|
|
- limitations under the License.
|
|
-->
|
|
|
|
<div align=center>
|
|
|
|

|
|
|
|
</div>
|
|
|
|
# Terminologies
|
|
|
|
## Kyuubi
|
|
|
|
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.
|
|
|
|
### JDBC
|
|
|
|
> The Java Database Connectivity (JDBC) API is the industry standard for database-independent connectivity between the Java programming language and a wide range of databases SQL databases and other tabular data sources,
|
|
> such as spreadsheets or flat files.
|
|
> The JDBC API provides a call-level API for SQL-based database access.
|
|
|
|
> JDBC technology allows you to use the Java programming language to exploit "Write Once, Run Anywhere" capabilities for applications that require access to enterprise data.
|
|
> With a JDBC technology-enabled driver, you can connect all corporate data even in a heterogeneous environment.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://www.oracle.com/java/technologies/javase/javase-tech-database.html">https://www.oracle.com/java/technologies/javase/javase-tech-database.html</a>
|
|
</em>
|
|
</p>
|
|
|
|
Typically, there is a gap between business development and big data analytics.
|
|
If the two are forcefully coupled, it would make the corresponding system difficult to operate and optimize.
|
|
One the flip side, if decoupled, the values of both can be maximized.
|
|
Business experts can stay focused on their own business development,
|
|
while Big Data engineers can continuously optimize server-side performance and stability.
|
|
Kyuubi combines the two seamlessly through an easy-to-use JDBC interface.
|
|
|
|
#### Apache Hive
|
|
|
|
> The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://hive.apache.org/">https://hive.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
Kyuubi supports Hive JDBC driver, which helps you seamlessly migrate your slow queries from Hive to Spark SQL.
|
|
|
|
#### Apache Thrift
|
|
|
|
> The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://thrift.apache.org/">https://thrift.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Server
|
|
|
|
Server is a daemon process that handles concurrent connection and query requests and converting these requests into various operations against the **query engines** to complete the responses to clients.
|
|
|
|
_**Aliases: Kyuubi Server / Kyuubi Instance / k.i.**_
|
|
|
|
### ServerSpace
|
|
|
|
A ServerSpace is used to register servers and expose them together as a service layer to clients.
|
|
|
|
### Engine
|
|
|
|
An engine handles all queries through Kyuubi servers.
|
|
It is created one Kyuubi server and can be shared with other Kyuubi servers by registering itself to an engine namespace.
|
|
All its capabilities are mainly powered by Spark SQL.
|
|
|
|
_**Aliases: Query Engine / Engine Instance / e.i.**_
|
|
|
|
### EngineSpace
|
|
|
|
An EngineSpace is internally used by servers to register and interact with engines.
|
|
|
|
#### Apache Spark
|
|
|
|
> [Apache Spark™](https://spark.apache.org/) is a unified analytics engine for large-scale data processing.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://spark.apache.org">https://spark.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Multi Tenancy
|
|
|
|
Kyuubi guarantees end-to-end multi-tenant isolation and sharing in the following pipeline
|
|
|
|
```
|
|
Client --> Kyuubi --> Query Engine(Spark) --> Resource Manager --> Data Storage Layer
|
|
```
|
|
|
|
### High Availability / Load Balance
|
|
|
|
As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High Availability (HA) mode helps you guarantee that.
|
|
|
|
#### Apache Zookeeper
|
|
|
|
> Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://zookeeper.apache.org/">https://zookeeper.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
#### Apache Curator
|
|
|
|
> Apache Curator is a Java/JVM client library for Apache ZooKeeper, a distributed coordination service. It includes a highlevel API framework and utilities to make using Apache ZooKeeper much easier and more reliable. It also includes recipes for common use cases and extensions such as service discovery and a Java 8 asynchronous DSL.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://curator.apache.org/">https://curator.apache.org</a>
|
|
</em>
|
|
</p>
|
|
|
|
## DataLake & LakeHouse
|
|
|
|
Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
|
|
|
|
### Apache Iceberg
|
|
|
|
> Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="http://iceberg.apache.org/">http://iceberg.apache.org/</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Delta Lake
|
|
|
|
> Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://delta.io/">https://delta.io</a>
|
|
</em>
|
|
</p>
|
|
|
|
### Apache Hudi
|
|
|
|
> Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores).
|
|
|
|
<p align=right>
|
|
<em>
|
|
<a href="https://hudi.apache.org/">https://hudi.apache.org</a>
|
|
</em>
|
|
</p>
|