kyuubi/dev
odone 9716548380
[KYUUBI #2282] Add lineage records for sql statement execution in Kyuubi engine logs
### _Why are the changes needed?_

Lineage information:
```
col0 -> (table.a, table.b)
col1 -> (table.c, table.a)
```

SQL lineage logger JSON format example.
**SQL:**
```
select a as col0, b as col1 from test_table0
```
**Lineage:**
```
{
   "inputTables": ["default.test_table0"],
   "outputTables": [],
   "columnLineage": [{
      "column": "col0",
      "originalColumns": ["default.test_table0.a"]
   }, {
      "column": "col1",
      "originalColumns": ["default.test_table0.b"]
   }]
}
```

Currently supported column lineage for spark `Command` and `Query` `TreeNode`:

### Query
- `Select`

### Command
- `CreateDataSourceTableAsSelectCommand`
- `CreateHiveTableAsSelectCommand`
- `OptimizedCreateHiveTableAsSelectCommand`
- `CreateTableAsSelect`
- `ReplaceTableAsSelect`
- `InsertIntoDataSourceCommand`
- `InsertIntoHadoopFsRelationCommand`
- `InsertIntoDataSourceDirCommand`
- `InsertIntoHiveDirCommand`
- `InsertIntoHiveTable`

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #3185 from iodone/kyuubi-2282.

Closes #2282

002c6d61 [odone] delete spark-sql-engine test for lineage
e1728a79 [odone] update lineage entity schema
de2a3e9a [odone] change kyuubi-spark-listener module to kyuubi-spark-lineage module
9258125e [odone] optimize lineage output
834669ed [odone] delete engine lineage parse
d9c7a3dc [odone] add spark listener to support lineage
4bae8c2f [odone] update for code cleaning
32b3392b [odone] update for review
fe09e478 [odone] add some test

Authored-by: odone <odone.zhang@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
2022-08-24 14:11:35 +08:00
..
kyuubi-codecov [KYUUBI #2282] Add lineage records for sql statement execution in Kyuubi engine logs 2022-08-24 14:11:35 +08:00
kyuubi-tpcds [KYUUBI #3298][FOLLOWUP] Fix CI failure 2022-08-24 11:54:36 +08:00
checkout_pr.sh
dependencyList [KYUUBI #3162] Bump Hadoop 3.3.4 2022-08-09 10:48:22 +08:00
merge_kyuubi_pr.py [KYUUBI #1957] Skip html comments in merge commit test body from PR desc 2022-02-22 14:19:49 +08:00
reformat [KYUUBI #2974][FEATURE] EOL Support for Spark 3.0 2022-07-12 11:01:50 +08:00