Commit Graph

24 Commits

Author SHA1 Message Date
liangbowen
c8a138f986 [KYUUBI #4933] [DOCS] [MINOR] Mark spark.sql.optimizer.insertRepartitionNum config for Spark 3.1 only
### _Why are the changes needed?_

- Update doc to mark the spark plugin's config `spark.sql.optimizer.insertRepartitionNum` used for Spark 3.1 only

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4933 from bowenliang123/insert-num.

Closes #4933

5ed6e2867 [liangbowen] comment and style
280a6af03 [liangbowen] spark.sql.optimizer.insertRepartitionNum only available for Spark 3.1.x
7f01cf3b6 [liangbowen] spark.sql.optimizer.insertRepartitionNum only available for Spark 3.1.x

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
2023-06-09 08:30:23 +08:00
wforget
408862af72
[KYUUBI #4814] Introduce Apache Atlas hook support in lineage plugin
### _Why are the changes needed?_

Implements AtlasLineageDispatcher to send lineage to Apache Atlas.

close #4814

Atlas Spark Model Definition: https://github.com/apache/atlas/blob/master/addons/models/1000-Hadoop/1100-spark_model.json

spark process:

![1](https://github.com/apache/kyuubi/assets/17894939/28e2c68c-0ffd-4f1d-b805-a7e964f85aab)

table lineage:

![2](https://github.com/apache/kyuubi/assets/17894939/76b3db6d-ed50-42e3-97cf-2f96d4e403df)

column lineage:

![3](https://github.com/apache/kyuubi/assets/17894939/41ae6ef8-acbf-43b9-ad05-42d669c5e950)

### _How was this patch tested?_
- [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [X] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4815 from wForget/KYUUBI-4814.

Closes #4814

3df8a7ec9 [wforget] comments
c58eae7c5 [wforget] comments
926bcf211 [wforget] comment
e0b4067c3 [wforget] comment
e4cc3e3f8 [wforget] comments
adc72b96f [Bowen Liang] Update extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala
e3bdd1c65 [Bowen Liang] Update extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala
baf1711ac [Bowen Liang] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcherSuite.scala
61e79f3d5 [Bowen Liang] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcherSuite.scala
541df3780 [Bowen Liang] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasLineageDispatcherSuite.scala
5dd310657 [wforget] fix
cea1e137d [wforget] fix
f028d4b09 [wforget] fix
0c9b4516b [wforget] fix
6f8113032 [wforget] add close atlas client shutdown hook
3f4d2a7db [wforget] add remote user
a0db58afc [wforget] comments
6dd3c66df [wforget] comments
f2b2a30dc [wforget] style
83eb1e481 [wforget] add atlas.column.lineage.enable configuration
0719a2b65 [wforget] doc
05f936005 [wforget] fix
d169b661d [wforget] fix
6da80d742 [wforget] fix
820ae5c5f [wforget] column lineages
dabe8173e [wforget] license
f22e044d2 [wforget] test
b948bce90 [wforget] fix and add test
0aef1be6b [wforget] fix
368b5ab3f [wforget] [KYUUBI-4814] Implements AtlasLineageDispatcher to send lineage to Apache Atlas

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: Bowen Liang <bowenliang@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-06 17:47:19 +08:00
wforget
19d5a9a371
[KYUUBI #4641] Add MaxFileSizeStrategy to limit max scan file size
### _Why are the changes needed?_

Add MaxFileSizeStrategy to limit max scan file size.
close #4641

### _How was this patch tested?_
- [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4642 from wForget/KYUUBI-4641.

Closes #4641

14a680f8e [wforget] comment
d2a393d97 [wforget] comment
b1ef4c52c [wforget] fix
d9e94bd8e [wforget] fix style
8a9121131 [wforget] use optional value
094eb61e3 [wforget] combine
89e2cb4d0 [wforget] [KYUUBI-4641] Add MaxFileSizeStrategy to limit max scan file size

Authored-by: wforget <643348094@qq.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-04-23 17:51:44 +08:00
Cheng Pan
609018a6b2
[KYUUBI #4727] [DOC] kyuubi-spark-lineage has no transitive deps
### _Why are the changes needed?_

Update outdated docs

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4727 from pan3793/lineage-doc.

Closes #4727

b6843b282 [Cheng Pan] [DOC] kyuubi-spark-lineage has no transitive deps

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: odone <odone.zhang@gmail.com>
2023-04-19 17:48:14 +08:00
ulysses-you
f0615a9aab
[KYUUBI #4683] Update spark.sql.finalWriteStage.resourceIsolation.enabled version
### _Why are the changes needed?_

fix the wrong version

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4683 from ulysses-you/followup.

Closes #4683

8e5d46fda [ulysses-you] update version

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-04-10 16:47:28 +08:00
ulysses-you
91a2ab3665
[KYUUBI #4678] Improve FinalStageResourceManager kill executors
### _Why are the changes needed?_

This pr change two things:
1. add a config to kill executors if the plan contains table caches. It's not always safe to kill executors if the cache is referenced by two write-like plan.
2. force adjustTargetNumExecutors when killing executors. YarnAllocator` might re-request original target executors if DRA has not updated target executors yet. Note, DRA would re-adjust executors if there are more tasks to be executed, so we are safe. It's better to adjuest target num executor once we kill executors.

### _How was this patch tested?_
These issues are found during my POC

Closes #4678 from ulysses-you/skip-cache.

Closes #4678

b12620954 [ulysses-you] Improve kill executors

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-04-10 11:41:37 +08:00
ulysses-you
9ca00c8aa7
[KYUUBI #4615] Support stage level schedule for final write stage
### _Why are the changes needed?_

Add a new rule `InjectCustomResourceProfile` to  support custom resource profile for final write stage.
It now supports executor configs:

```
executor core
executor memory
executor memory overhead
executor off heap memory
```

### _How was this patch tested?_

add test and manully test

<img width="778" alt="image" src="https://user-images.githubusercontent.com/12025282/226606147-82a29b8c-1a31-4842-97a7-fe702d80e190.png">

Closes #4615 from ulysses-you/resource-profile.

Closes #4615

852b207cd [ulysses-you] Support stage level schedule for final write stag

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-03-30 13:04:51 +08:00
ulysses-you
b8f452692a [KYUUBI #4592] Support eagerly kill redundant executors
### _Why are the changes needed?_

This pr adds a new rule `FinalStageResourceManager` to eagerly kill redundant executors

We first get the final stage partition which is the actually required cores, then kill the redundant executors. The priority of kill executors follow:
  1. kill executor who is younger than other (The older the JIT works better)
  2. kill executor who produces less shuffle data first

The reason why add this feature is that, if the previous stage contains lots executors but final stage has less, then the tasks of final stage would be scheduled randomly in all exists executors which may cause resource waste. e.g., each executor only run 1 or 2 tasks but holds 4 or 5 cores.

### _How was this patch tested?_
test manually

- test for the kill executor
<img width="755" alt="image" src="https://user-images.githubusercontent.com/12025282/227203809-9fe0731c-f97f-40d2-ac7f-b892a2a35289.png">

Closes #4592 from ulysses-you/eagerly-kill-executors.

Closes #4592

f35208bfd [ulysses-you] nit
ec627ee4f [ulysses-you] nit
28d4230f8 [ulysses-you] address comments
f2492cec6 [ulysses-you] nit
f44e48451 [ulysses-you] Support eagerly kill redundant executors

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-03-24 18:24:53 +08:00
wForget
c4f3195bd6 [KYUUBI #3929] Refactor lineage plugin to add LineageDispatcher
### _Why are the changes needed?_

Refactor lineage plugin to add LineageDispatcher.

close #3929

### _How was this patch tested?_
- [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3919 from wForget/dev-lineage-dispatcher.

Closes #3929

5df2aa2f [wforget] add doc
98683ebc [wforget] fix
7b97b2e0 [wForget] rebase
4b046868 [wForget] separate LineageDispatcherType class file
e14cf838 [wForget] Refactor lineage plugin to add LineageDispatcher

Lead-authored-by: wForget <643348094@qq.com>
Co-authored-by: wforget <643348094@qq.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-03-22 09:46:14 +08:00
odone
25e7b22553 [KYUUBI #4330] Non-temporary views do not resolve to a specific real table
close #4330

### _Why are the changes needed?_

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4503 from iodone/kyuubi-4330.

Closes #4330

d2c48e7a [odone] Instead of `optimizedPlan` with `analyzedPlan`
12614d19 [odone] add skip permenent view support

Authored-by: odone <odone.zhang@gmail.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-03-16 12:58:41 +08:00
liangbowen
62eefdb57e [KYUUBI #4235] [DOCS] Prefer https:// URLs in docs
### _Why are the changes needed?_

- Prefer `https://` URLs in docs, and all changed URLs are validated.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4235 from bowenliang123/https-link.

Closes #4235

f114dde2 [liangbowen] update AllKyuubiConfiguration
ad8aaedf [liangbowen] style
e973be5a [liangbowen] update
2370f4bf [liangbowen] prefer https URLs in docs

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
2023-02-03 14:01:11 +08:00
liangbowen
22e9fd7d68 [KYUUBI #4226] Fix word spelling typos in docs
### _Why are the changes needed?_

- fix word spelling typos in docs

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4226 from bowenliang123/doc-word-typo.

Closes #4226

393de90d [liangbowen] update
365cdc4b [liangbowen] fix word typos in docs

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
2023-02-02 11:43:03 +08:00
liangbowen
69d625a1be [KYUUBI #4200] [Improvement] [Docs] Introduce Markdown formatting with spotless-maven-plugin and flexmark for docs
### _Why are the changes needed?_

- to consolidate styles in markdown files from manual written or auto-generated
- apply markdown formatting rules with flexmark from [spotless-maven-plugin](https://github.com/diffplug/spotless/tree/main/plugin-maven#markdown) to *.md files in `/docs`
- use `flexmark` to format markdown generation in `TestUtils` of common module used by `AllKyuubiConfiguration` and `KyuubiDefinedFunctionSuite`, as the same way in `FlexmarkFormatterFunc ` of `spotless-maven-plugin` using with `COMMONMARK` as `FORMATTER_EMULATION_PROFILE` (https://github.com/diffplug/spotless/blob/maven/2.30.0/lib/src/flexmark/java/com/diffplug/spotless/glue/markdown/FlexmarkFormatterFunc.java)
- using `flexmark` of` 0.62.2`, as the last version requiring Java 8+ (checked from pom file and bytecode version)

```
<markdown>
    <includes>
        <include>docs/**/*.md</include>
    </includes>
    <flexmark></flexmark>
</markdown>
```

- Changes applied to markdown doc files,
  -  no style change or breakings in built docs by `make html`
  - removal all the first blank in licences and comments to conform markdown style rules
  - tables regenerated by flexmark following as in [GitHub Flavored Markdown](https://help.github.com/articles/organizing-information-with-tables/) (https://github.com/vsch/flexmark-java/wiki/Extensions#tables)

### _How was this patch tested?_
- [x] regenerate docs using `make html` successfully and check all the markdown pages available
- [x] regenerate `settings.md` and `functions.md` by `AllKyuubiConfiguration` and `KyuubiDefinedFunctionSuite`, and pass the checks by both themselves and spotless check via `dev/reformat`
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4200 from bowenliang123/markdown-formatting.

Closes #4200

1eeafce4 [liangbowen] revert minor changes in AllKyuubiConfiguration
4f892857 [liangbowen] use flexmark in markdown doc generation
8c978abd [liangbowen] changes on markdown files
a9190556 [liangbowen] apply markdown formatting rules with `spotless-maven-plugin` to markdown files with in `/docs`

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
2023-01-30 11:14:41 +08:00
liangbowen
89c7435dca
[KYUUBI #4161] [DOCS] Refine settings page with correction in grammar and spelling mistakes of config descriptions
### _Why are the changes needed?_

As Kyuubi graduated as top level project, the setting page will be more often requested and should be increasingly reliable and readable with less grammar and spelling mistakes.

This PR is to
- correct mistakes in grammar, spelling, abbreviation and terminology
- with no config name or essential meanings changed

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4161 from bowenliang123/conf-grammar.

Closes #4161

038edfbea [liangbowen] nit
1ec073a4b [liangbowen] to JSON
4f5259a32 [liangbowen] to Prometheus
523855008 [liangbowen] to K8s
fc7a3a81e [liangbowen] to AUTO-GENERATED
da64f54fa [liangbowen] update
d54f9a528 [liangbowen] fix `comma separated` to `comma-separated`
f1d7cc1f1 [liangbowen] update
d84208844 [liangbowen] update
1b75f011c [liangbowen] correction of grammar and spelling mistakes

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: Kent Yao <yao@apache.org>
2023-01-16 18:34:01 +08:00
jiaoqingbo
08f99d5270
[KYUUBI #4070] Add missing spark commands to lineage.md
### _Why are the changes needed?_

fix #4070 ,all commands in alphabetical order

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4072 from jiaoqingbo/kyuubi4070.

Closes #4070

abb62aeb [jiaoqingbo] [KYUUBI #4070] Add missing spark commands to lineage.md

Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-01-04 18:03:15 +08:00
ulysses-you
0495350082
[KYUUBI #3988] Final stage config isolation support write only
### _Why are the changes needed?_

Detect and inject a tag if plan is for writing, then skip doing final stage isolation at query preparation phase.

To make final stage config more flexible with complex Spark application.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3988 from ulysses-you/final-stage.

Closes #3988

d0f2b622 [ulysses-you] fix
e5351fd5 [ulysses-you] nit
39082b20 [ulysses-you] Final stage config isolation support write only

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
2022-12-29 15:35:42 +08:00
yongqian
182227bd16
[KYUUBI #4024] [DOCS] Update the rules documentation for Kyuubi Spark SQL extension
### _Why are the changes needed?_

Update outdated docs.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4024 from QianyongY/features/update-outdated-docs.

Closes #4024

0ad7173f [yongqian] [DOCS] Update the rules documentation for Kyuubi Spark SQL extension

Authored-by: yongqian <yongqian@trip.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
2022-12-26 09:40:34 +08:00
ulysses-you
fa9e6be663
[KYUUBI #3962] Add two conditions to decide if add shuffle before writing
### _Why are the changes needed?_

add two conditions to decide if we should add shuffle.

1. make sure AQE is enabled, otherwise it is no meaning to add a shuffle
2. try to reduce the performance regression if add a shuffle

for condition 2: we do not add shuffle if the original plan does not have shuffle

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3962 from ulysses-you/no-shuffle.

Closes #3962

a084cccc [ulysses-you] address comment
9d0aab1b [ulysses-you] address comment
09fc9b21 [ulysses-you] fix ut
06f249a2 [ulysses-you] Reduce the performance regression

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
2022-12-12 20:22:10 +08:00
ulysses-you
2acee9ea97
[KYUUBI #3601] [SPARK] Support infer columns for rebalance and sort
### _Why are the changes needed?_

Improve the rebalance before writing rule.

The rebalance before writing rule adds a rebalance at the top of query for data writing command, however the default partitioning of rebalance uses RoundRobinPartitioning which would break the original partitioning of data. It may cause the the output data size bigger than before.

This pr supports infer the columns from join and aggregate for rebalance and sort to improve the compression ratio.

Note that, this improvement only works for static partition writing.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3601 from ulysses-you/smart-order.

Closes #3601

c190dc1a [ulysses-you] docs
995969b5 [ulysses-you] view
ea23c417 [ulysses-you] Support infer columns for rebalance and sort

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
2022-10-17 18:13:50 +08:00
Bowen Liang
1a9bf93051
[KYUUBI #3487] Provide Hive JDBC Dialect support for Spark/PySpark to connect Kyuubi via JDBC Source
…and register to JdbcDialects

### _Why are the changes needed?_

close #3487 .

1. add kyuubi-extension-spark-client_2.12 module, and introduce KyuubiSparkClientExtension
2. implement HiveDialect and register to JdbcDialects

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3489 from bowenliang123/3487-hive-jdbc-dialect.

Closes #3487

3ed8be75 [Bowen Liang] nit
47be0ba6 [Bowen Liang] update docs for hive jdbc dialect
84623a35 [Bowen Liang] update pom in minor details
b7edc6cf [Bowen Liang] add ut
968bb722 [Bowen Liang] move to package org.apache.spark.sql.dialect
03eab323 [Bowen Liang] renamed to kyuubi-extension-spark-jdbc-dialect module and moved to extensions/spark
9a4eaf44 [Bowen Liang] add kyuubi-extension-spark-client_2.12 module, implement HiveDialect and register to JdbcDialects

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2022-09-16 10:29:39 +08:00
odone
99477f7a54
[KYUUBI #3312] Add subquery support for sql lineage parser
close #3312
### _Why are the changes needed?_

SQL supported like:
```sql

-- ScalarQuery
select (select a from table0) as aa, b as bb from table0
select (select count(*) from table0) as aa, b as bb from table0

-- Left Semi or Anti Join
select * from table0 where table0.a in (select a from table1)
select * from table0 where table0.a not in (select a from table1)
select * from table0 where exists (select * from table1 where table0.c = table1.c)

```

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3384 from iodone/kyuubi-3312.

Closes #3312

e2af4e1c [odone] change lineage column __aggregate__ to __count__ if exist count(*)
d9c46c34 [odone] add aggregate expression lineage extracting
2fd63482 [odone] add subquery support

Authored-by: odone <odone.zhang@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
2022-09-08 09:28:00 +08:00
odone
9716548380
[KYUUBI #2282] Add lineage records for sql statement execution in Kyuubi engine logs
### _Why are the changes needed?_

Lineage information:
```
col0 -> (table.a, table.b)
col1 -> (table.c, table.a)
```

SQL lineage logger JSON format example.
**SQL:**
```
select a as col0, b as col1 from test_table0
```
**Lineage:**
```
{
   "inputTables": ["default.test_table0"],
   "outputTables": [],
   "columnLineage": [{
      "column": "col0",
      "originalColumns": ["default.test_table0.a"]
   }, {
      "column": "col1",
      "originalColumns": ["default.test_table0.b"]
   }]
}
```

Currently supported column lineage for spark `Command` and `Query` `TreeNode`:

### Query
- `Select`

### Command
- `CreateDataSourceTableAsSelectCommand`
- `CreateHiveTableAsSelectCommand`
- `OptimizedCreateHiveTableAsSelectCommand`
- `CreateTableAsSelect`
- `ReplaceTableAsSelect`
- `InsertIntoDataSourceCommand`
- `InsertIntoHadoopFsRelationCommand`
- `InsertIntoDataSourceDirCommand`
- `InsertIntoHiveDirCommand`
- `InsertIntoHiveTable`

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3185 from iodone/kyuubi-2282.

Closes #2282

002c6d61 [odone] delete spark-sql-engine test for lineage
e1728a79 [odone] update lineage entity schema
de2a3e9a [odone] change kyuubi-spark-listener module to kyuubi-spark-lineage module
9258125e [odone] optimize lineage output
834669ed [odone] delete engine lineage parse
d9c7a3dc [odone] add spark listener to support lineage
4bae8c2f [odone] update for code cleaning
32b3392b [odone] update for review
fe09e478 [odone] add some test

Authored-by: odone <odone.zhang@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
2022-08-24 14:11:35 +08:00
Cheng Pan
6aa898e50e
[KYUUBI #3210] [DOCS] Mention Kyuubi Spark SQL extension supports Spark 3.3
### _Why are the changes needed?_

Update outdated docs.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3210 from pan3793/spark-extension.

Closes #3210

5e5ebd35 [Cheng Pan] Mention Kyuubi Spark SQL extension supports Spark 3.3

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2022-08-10 10:16:32 +08:00
Kent Yao
6c8024c8a4
[KYUUBI #3101] [Subtask][#3100] Build the content for extension points documentation
### _Why are the changes needed?_

Build the content for extension points documentation, pre-work for #3100

<img width="1767" alt="image" src="https://user-images.githubusercontent.com/8326978/179930987-1accbbb7-e804-4230-871f-6c4b1152f4a1.png">

1. the extensions are divided into 2: server side and engine side extensions. (Do we have client side extension support?)
2. the server side authentication page is cross-referenced by the security section, see 1 in the picture.
3. the engine side ones are grouped by different compute frameworks.
4. connector is one type of extension, so we cross-reference the connector pages directly, see 2 & 3 in the picture.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3103 from yaooqinn/3101.

Closes #3101

a9ae3e32 [Kent Yao] [KYUUBI #3101] [Subtask][#3100] Build content for extension points documentation
3b7367e9 [Kent Yao] [KYUUBI #3101] [Subtask][#3100] Build content for extension points documentation
b5eda13e [Kent Yao] [KYUUBI #3101] [Subtask][#3100] Build content for extension points documentation

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
2022-07-21 15:37:19 +08:00