[KYUUBI #6546] Update incorrect descriptions in Zorder related configurations

# 🔍 Description ## Issue References 🔗 This pull request fixes #6546 ## Describe Your Solution 🔧 Fix incorrect documentation description. ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #6547 from huangxiaopingRD/6546. Closes #6546 fab3b93c2 [huangxiaoping] Merge remote-tracking branch 'origin/6546' 17bd5ea0d [huangxiaoping] [KYUUBI #6546] Fix incorrect documentation description 8f53a8911 [huangxiaoping] [KYUUBI #6546] Fix incorrect documentation description 449d3f1ea [huangxiaoping] [KYUUBI #6546] Fix incorrect documentation description Authored-by: huangxiaoping <1754789345@qq.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-19 10:27:39 +08:00 · 2024-07-19 10:27:39 +08:00 · 4040acb321
commit 4040acb321
parent 1309819749
4 changed files with 39 additions and 45 deletions
--- a/docs/extensions/engines/spark/rules.md
+++ b/docs/extensions/engines/spark/rules.md
@ -65,31 +65,31 @@ Now, you can enjoy the Kyuubi SQL Extension.

 Kyuubi provides some configs to make these feature easy to use.

-|                                Name                                 |             Default Value              |                                                                                                                                                                   Description                                                                                                                                                                   | Since |
-|---------------------------------------------------------------------|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
-| spark.sql.optimizer.insertRepartitionBeforeWrite.enabled            | true                                   | Add repartition node at the top of query plan. An approach of merging small files.                                                                                                                                                                                                                                                              | 1.2.0 |
-| spark.sql.optimizer.forceShuffleBeforeJoin.enabled                  | false                                  | Ensure shuffle node exists before shuffled join (shj and smj) to make AQE `OptimizeSkewedJoin` works (complex scenario join, multi table join).                                                                                                                                                                                                 | 1.2.0 |
-| spark.sql.optimizer.finalStageConfigIsolation.enabled               | false                                  | If true, the final stage support use different config with previous stage. The prefix of final stage config key should be `spark.sql.finalStage.`. For example, the raw spark config: `spark.sql.adaptive.advisoryPartitionSizeInBytes`, then the final stage config should be: `spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes`.   | 1.2.0 |
-| spark.sql.optimizer.insertZorderBeforeWriting.enabled               | true                                   | When true, we will follow target table properties to insert zorder or not. The key properties are: 1) `kyuubi.zorder.enabled`: if this property is true, we will insert zorder before writing data. 2) `kyuubi.zorder.cols`: string split by comma, we will zorder by these cols.                                                               | 1.4.0 |
-| spark.sql.optimizer.zorderGlobalSort.enabled                        | true                                   | When true, we do a global sort using zorder. Note that, it can cause data skew issue if the zorder columns have less cardinality. When false, we only do local sort using zorder.                                                                                                                                                               | 1.4.0 |
-| spark.sql.watchdog.maxPartitions                                    | none                                   | Set the max partition number when spark scans a data source. Enable maxPartition Strategy by specifying this configuration. Add maxPartitions Strategy to avoid scan excessive partitions on partitioned table, it's optional that works with defined                                                                                           | 1.4.0 |
-| spark.sql.watchdog.maxFileSize                                      | none                                   | Set the maximum size in bytes of files when spark scans a data source. Enable maxFileSize Strategy by specifying this configuration. Add maxFileSize Strategy to avoid scan excessive size of files, it's optional that works with defined                                                                                                      | 1.8.0 |
-| spark.sql.optimizer.dropIgnoreNonExistent                           | false                                  | When true, do not report an error if DROP DATABASE/TABLE/VIEW/FUNCTION/PARTITION specifies a non-existent database/table/view/function/partition                                                                                                                                                                                                | 1.5.0 |
-| spark.sql.optimizer.rebalanceBeforeZorder.enabled                   | false                                  | When true, we do a rebalance before zorder in case data skew. Note that, if the insertion is dynamic partition we will use the partition columns to rebalance. Note that, this config only affects with Spark 3.3.x.                                                                                                                            | 1.6.0 |
-| spark.sql.optimizer.rebalanceZorderColumns.enabled                  | false                                  | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do rebalance before Z-Order. If it's dynamic partition insert, the rebalance expression will include both partition columns and Z-Order columns. Note that, this config only affects with Spark 3.3.x.                                                            | 1.6.0 |
-| spark.sql.optimizer.twoPhaseRebalanceBeforeZorder.enabled           | false                                  | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do two phase rebalance before Z-Order for the dynamic partition write. The first phase rebalance using dynamic partition column; The second phase rebalance using dynamic partition column Z-Order columns. Note that, this config only affects with Spark 3.3.x. | 1.6.0 |
-| spark.sql.optimizer.zorderUsingOriginalOrdering.enabled             | false                                  | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do sort by the original ordering i.e. lexicographical order. Note that, this config only affects with Spark 3.3.x.                                                                                                                                                | 1.6.0 |
-| spark.sql.optimizer.inferRebalanceAndSortOrders.enabled             | false                                  | When ture, infer columns for rebalance and sort orders from original query, e.g. the join keys from join. It can avoid compression ratio regression.                                                                                                                                                                                            | 1.7.0 |
-| spark.sql.optimizer.inferRebalanceAndSortOrdersMaxColumns           | 3                                      | The max columns of inferred columns.                                                                                                                                                                                                                                                                                                            | 1.7.0 |
-| spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled | false                                  | When true, add repartition even if the original plan does not have shuffle.                                                                                                                                                                                                                                                                     | 1.7.0 |
-| spark.sql.optimizer.finalStageConfigIsolationWriteOnly.enabled      | true                                   | When true, only enable final stage isolation for writing.                                                                                                                                                                                                                                                                                       | 1.7.0 |
-| spark.sql.finalWriteStage.eagerlyKillExecutors.enabled              | false                                  | When true, eagerly kill redundant executors before running final write stage.                                                                                                                                                                                                                                                                   | 1.8.0 |
-| spark.sql.finalWriteStage.skipKillingExecutorsForTableCache         | true                                   | When true, skip killing executors if the plan has table caches.                                                                                                                                                                                                                                                                                 | 1.8.0 |
-| spark.sql.finalWriteStage.retainExecutorsFactor                     | 1.2                                    | If the target executors * factor < active executors, and target executors * factor > min executors, then inject kill executors or inject custom resource profile.                                                                                                                                                                               | 1.8.0 |
-| spark.sql.finalWriteStage.resourceIsolation.enabled                 | false                                  | When true, make final write stage resource isolation using custom RDD resource profile.                                                                                                                                                                                                                                                         | 1.8.0 |
-| spark.sql.finalWriteStageExecutorCores                              | fallback spark.executor.cores          | Specify the executor core request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                                        | 1.8.0 |
-| spark.sql.finalWriteStageExecutorMemory                             | fallback spark.executor.memory         | Specify the executor on heap memory request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                              | 1.8.0 |
-| spark.sql.finalWriteStageExecutorMemoryOverhead                     | fallback spark.executor.memoryOverhead | Specify the executor memory overhead request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                             | 1.8.0 |
-| spark.sql.finalWriteStageExecutorOffHeapMemory                      | NONE                                   | Specify the executor off heap memory request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                             | 1.8.0 |
-| spark.sql.execution.scriptTransformation.enabled                    | true                                   | When false, script transformation is not allowed.                                                                                                                                                                                                                                                                                               | 1.9.0 |
+|                                Name                                 |             Default Value              |                                                                                                                                                                  Description                                                                                                                                                                  | Since |
+|---------------------------------------------------------------------|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
+| spark.sql.optimizer.insertRepartitionBeforeWrite.enabled            | true                                   | Add repartition node at the top of query plan. An approach of merging small files.                                                                                                                                                                                                                                                            | 1.2.0 |
+| spark.sql.optimizer.forceShuffleBeforeJoin.enabled                  | false                                  | Ensure shuffle node exists before shuffled join (shj and smj) to make AQE `OptimizeSkewedJoin` works (complex scenario join, multi table join).                                                                                                                                                                                               | 1.2.0 |
+| spark.sql.optimizer.finalStageConfigIsolation.enabled               | false                                  | If true, the final stage support use different config with previous stage. The prefix of final stage config key should be `spark.sql.finalStage.`. For example, the raw spark config: `spark.sql.adaptive.advisoryPartitionSizeInBytes`, then the final stage config should be: `spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes`. | 1.2.0 |
+| spark.sql.optimizer.insertZorderBeforeWriting.enabled               | true                                   | When true, we will follow target table properties to insert zorder or not. The key properties are: 1) `kyuubi.zorder.enabled`: if this property is true, we will insert zorder before writing data. 2) `kyuubi.zorder.cols`: string split by comma, we will zorder by these cols.                                                             | 1.4.0 |
+| spark.sql.optimizer.zorderGlobalSort.enabled                        | true                                   | When true, we do a global sort using zorder. Note that, it can cause data skew issue if the zorder columns have less cardinality. When false, we only do local sort using zorder.                                                                                                                                                             | 1.4.0 |
+| spark.sql.watchdog.maxPartitions                                    | none                                   | Set the max partition number when spark scans a data source. Enable maxPartition Strategy by specifying this configuration. Add maxPartitions Strategy to avoid scan excessive partitions on partitioned table, it's optional that works with defined                                                                                         | 1.4.0 |
+| spark.sql.watchdog.maxFileSize                                      | none                                   | Set the maximum size in bytes of files when spark scans a data source. Enable maxFileSize Strategy by specifying this configuration. Add maxFileSize Strategy to avoid scan excessive size of files, it's optional that works with defined                                                                                                    | 1.8.0 |
+| spark.sql.optimizer.dropIgnoreNonExistent                           | false                                  | When true, do not report an error if DROP DATABASE/TABLE/VIEW/FUNCTION/PARTITION specifies a non-existent database/table/view/function/partition                                                                                                                                                                                              | 1.5.0 |
+| spark.sql.optimizer.rebalanceBeforeZorder.enabled                   | false                                  | When true, we do a rebalance before zorder in case data skew. Note that, if the insertion is dynamic partition we will use the partition columns to rebalance.                                                                                                                                                                                | 1.6.0 |
+| spark.sql.optimizer.rebalanceZorderColumns.enabled                  | false                                  | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do rebalance before Z-Order. If it's dynamic partition insert, the rebalance expression will include both partition columns and Z-Order columns.                                                                                                                | 1.6.0 |
+| spark.sql.optimizer.twoPhaseRebalanceBeforeZorder.enabled           | false                                  | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do two phase rebalance before Z-Order for the dynamic partition write. The first phase rebalance using dynamic partition column; The second phase rebalance using dynamic partition column Z-Order columns.                                                     | 1.6.0 |
+| spark.sql.optimizer.zorderUsingOriginalOrdering.enabled             | false                                  | When true and `spark.sql.optimizer.rebalanceBeforeZorder.enabled` is true, we do sort by the original ordering i.e. lexicographical order.                                                                                                                                                                                                    | 1.6.0 |
+| spark.sql.optimizer.inferRebalanceAndSortOrders.enabled             | false                                  | When ture, infer columns for rebalance and sort orders from original query, e.g. the join keys from join. It can avoid compression ratio regression.                                                                                                                                                                                          | 1.7.0 |
+| spark.sql.optimizer.inferRebalanceAndSortOrdersMaxColumns           | 3                                      | The max columns of inferred columns.                                                                                                                                                                                                                                                                                                          | 1.7.0 |
+| spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled | false                                  | When true, add repartition even if the original plan does not have shuffle.                                                                                                                                                                                                                                                                   | 1.7.0 |
+| spark.sql.optimizer.finalStageConfigIsolationWriteOnly.enabled      | true                                   | When true, only enable final stage isolation for writing.                                                                                                                                                                                                                                                                                     | 1.7.0 |
+| spark.sql.finalWriteStage.eagerlyKillExecutors.enabled              | false                                  | When true, eagerly kill redundant executors before running final write stage.                                                                                                                                                                                                                                                                 | 1.8.0 |
+| spark.sql.finalWriteStage.skipKillingExecutorsForTableCache         | true                                   | When true, skip killing executors if the plan has table caches.                                                                                                                                                                                                                                                                               | 1.8.0 |
+| spark.sql.finalWriteStage.retainExecutorsFactor                     | 1.2                                    | If the target executors * factor < active executors, and target executors * factor > min executors, then inject kill executors or inject custom resource profile.                                                                                                                                                                             | 1.8.0 |
+| spark.sql.finalWriteStage.resourceIsolation.enabled                 | false                                  | When true, make final write stage resource isolation using custom RDD resource profile.                                                                                                                                                                                                                                                       | 1.8.0 |
+| spark.sql.finalWriteStageExecutorCores                              | fallback spark.executor.cores          | Specify the executor core request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                                      | 1.8.0 |
+| spark.sql.finalWriteStageExecutorMemory                             | fallback spark.executor.memory         | Specify the executor on heap memory request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                            | 1.8.0 |
+| spark.sql.finalWriteStageExecutorMemoryOverhead                     | fallback spark.executor.memoryOverhead | Specify the executor memory overhead request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                           | 1.8.0 |
+| spark.sql.finalWriteStageExecutorOffHeapMemory                      | NONE                                   | Specify the executor off heap memory request for final write stage. It would be passed to the RDD resource profile.                                                                                                                                                                                                                           | 1.8.0 |
+| spark.sql.execution.scriptTransformation.enabled                    | true                                   | When false, script transformation is not allowed.                                                                                                                                                                                                                                                                                             | 1.9.0 |

--- a/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-4/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
@ -71,7 +71,7 @@ object KyuubiSQLConf {
    buildConf("spark.sql.optimizer.rebalanceBeforeZorder.enabled")
      .doc("When true, we do a rebalance before zorder in case data skew. " +
        "Note that, if the insertion is dynamic partition we will use the partition " +
-        "columns to rebalance. Note that, this config only affects with Spark 3.3.x")
+        "columns to rebalance.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -80,8 +80,7 @@ object KyuubiSQLConf {
    buildConf("spark.sql.optimizer.rebalanceZorderColumns.enabled")
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do rebalance before " +
        s"Z-Order. If it's dynamic partition insert, the rebalance expression will include " +
-        s"both partition columns and Z-Order columns. Note that, this config only " +
-        s"affects with Spark 3.3.x")
+        s"both partition columns and Z-Order columns.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -91,7 +90,7 @@ object KyuubiSQLConf {
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do two phase rebalance " +
        s"before Z-Order for the dynamic partition write. The first phase rebalance using " +
        s"dynamic partition column; The second phase rebalance using dynamic partition column + " +
-        s"Z-Order columns. Note that, this config only affects with Spark 3.3.x")
+        s"Z-Order columns.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -99,8 +98,7 @@ object KyuubiSQLConf {
  val ZORDER_USING_ORIGINAL_ORDERING_ENABLED =
    buildConf("spark.sql.optimizer.zorderUsingOriginalOrdering.enabled")
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do sort by " +
-        s"the original ordering i.e. lexicographical order. Note that, this config only " +
-        s"affects with Spark 3.3.x")
+        s"the original ordering i.e. lexicographical order.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
--- a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
+++ b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
@ -71,7 +71,7 @@ object KyuubiSQLConf {
    buildConf("spark.sql.optimizer.rebalanceBeforeZorder.enabled")
      .doc("When true, we do a rebalance before zorder in case data skew. " +
        "Note that, if the insertion is dynamic partition we will use the partition " +
-        "columns to rebalance. Note that, this config only affects with Spark 3.3.x")
+        "columns to rebalance.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -80,8 +80,7 @@ object KyuubiSQLConf {
    buildConf("spark.sql.optimizer.rebalanceZorderColumns.enabled")
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do rebalance before " +
        s"Z-Order. If it's dynamic partition insert, the rebalance expression will include " +
-        s"both partition columns and Z-Order columns. Note that, this config only " +
-        s"affects with Spark 3.3.x")
+        s"both partition columns and Z-Order columns.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -91,7 +90,7 @@ object KyuubiSQLConf {
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do two phase rebalance " +
        s"before Z-Order for the dynamic partition write. The first phase rebalance using " +
        s"dynamic partition column; The second phase rebalance using dynamic partition column + " +
-        s"Z-Order columns. Note that, this config only affects with Spark 3.3.x")
+        s"Z-Order columns.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -99,8 +98,7 @@ object KyuubiSQLConf {
  val ZORDER_USING_ORIGINAL_ORDERING_ENABLED =
    buildConf("spark.sql.optimizer.zorderUsingOriginalOrdering.enabled")
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do sort by " +
-        s"the original ordering i.e. lexicographical order. Note that, this config only " +
-        s"affects with Spark 3.3.x")
+        s"the original ordering i.e. lexicographical order.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
--- a/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
+++ b/extensions/spark/kyuubi-extension-spark-common/src/main/scala/org/apache/kyuubi/sql/KyuubiSQLConf.scala
@ -71,7 +71,7 @@ object KyuubiSQLConf {
    buildConf("spark.sql.optimizer.rebalanceBeforeZorder.enabled")
      .doc("When true, we do a rebalance before zorder in case data skew. " +
        "Note that, if the insertion is dynamic partition we will use the partition " +
-        "columns to rebalance. Note that, this config only affects with Spark 3.3.x")
+        "columns to rebalance.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -80,8 +80,7 @@ object KyuubiSQLConf {
    buildConf("spark.sql.optimizer.rebalanceZorderColumns.enabled")
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do rebalance before " +
        s"Z-Order. If it's dynamic partition insert, the rebalance expression will include " +
-        s"both partition columns and Z-Order columns. Note that, this config only " +
-        s"affects with Spark 3.3.x")
+        s"both partition columns and Z-Order columns.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -91,7 +90,7 @@ object KyuubiSQLConf {
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do two phase rebalance " +
        s"before Z-Order for the dynamic partition write. The first phase rebalance using " +
        s"dynamic partition column; The second phase rebalance using dynamic partition column + " +
-        s"Z-Order columns. Note that, this config only affects with Spark 3.3.x")
+        s"Z-Order columns.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)
@ -99,8 +98,7 @@ object KyuubiSQLConf {
  val ZORDER_USING_ORIGINAL_ORDERING_ENABLED =
    buildConf("spark.sql.optimizer.zorderUsingOriginalOrdering.enabled")
      .doc(s"When true and ${REBALANCE_BEFORE_ZORDER.key} is true, we do sort by " +
-        s"the original ordering i.e. lexicographical order. Note that, this config only " +
-        s"affects with Spark 3.3.x")
+        s"the original ordering i.e. lexicographical order.")
      .version("1.6.0")
      .booleanConf
      .createWithDefault(false)