# Why are the changes needed? ## Issue reference: https://github.com/apache/kyuubi/issues/6912 ## How to reproduce the issue? The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows: step 1: create a temporary view from a file step 2: insert into a table by selecting from the temporary view in step 1 step 3: generate the lineage when executing the insert statement in step 2 In detail, please see the UT code submission in this patch. ## The issue analysis Let's see the current code when getting the Lineage object by resolving a LogicalPlan object: <img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" /> According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes: ### Unit Test Environment In Unit Test, when the code runs here a "None.get" exception will be raised: <img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" /> Here's the runtime exception stack: ``` None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465) ``` ### Production Environment This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be: ``` inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set()))) ``` a newly added test case(test directory to table) passed after this issue is fixed. # How to fix the issue? Add a "Empty judgment" logic. In detail, please see the code submission in this patch. # How was this patch tested? 1. by adding a new test case in UT code and make sure it passes 2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object # Was this patch authored or co-authored using generative AI tooling? No Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception. Closes #6912 13a71075d [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 4e89b95cd [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 59b350bfb [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code 52bc0288d [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle fea6bbc0d [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code 901879095 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test fbb4df879 [xglv1985] fix a runtime exception when generate column lineage tuple Lead-authored-by: xglv1985 <xglv1985@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> |
||
|---|---|---|
| .. | ||
| src | ||
| pom.xml | ||
| README.md | ||
Kyuubi Spark Listener Extension
Functions
- All
listenerextensions can be implemented in this module, likeQueryExecutionListenerandExtraListener - Add
SparkOperationLineageQueryExecutionListenerto extends sparkQueryExecutionListener - SQL lineage parsing will be triggered after SQL execution and will be written to the json logger file
Build
build/mvn clean package -DskipTests -pl :kyuubi-spark-lineage_2.12 -am -Dspark.version=3.2.1
Supported Apache Spark Versions
-Dspark.version=
- master
- 3.5.x (default)
- 3.4.x
- 3.3.x
- 3.2.x
- 3.1.x