kyuubi/dev/kyuubi-tpcds
Bowen Liang d3520ddbce [KYUUBI #6769] [RELEASE] Bump 1.11.0-SNAPSHOT
# 🔍 Description
## Issue References 🔗

This pull request fixes #

## Describe Your Solution 🔧

Preparing v1.11.0-SNAPSHOT after branch-1.10 cut

```shell
build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="1.11.0-SNAPSHOT"
(cd kyuubi-server/web-ui && npm version "1.11.0-SNAPSHOT")
```

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6769 from bowenliang123/bump-1.11.

Closes #6769

6db219d28 [Bowen Liang] get latest_branch by sorting version in branch name
465276204 [Bowen Liang] update package.json
81f2865e5 [Bowen Liang] bump

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
2024-10-23 17:10:56 +08:00
..
src/main [KYUUBI #5925] Kyuubi TPC-DS support running benchmark with skipping some queries 2023-12-28 17:01:46 +08:00
pom.xml [KYUUBI #6769] [RELEASE] Bump 1.11.0-SNAPSHOT 2024-10-23 17:10:56 +08:00
README.md [KYUUBI #5925] Kyuubi TPC-DS support running benchmark with skipping some queries 2023-12-28 17:01:46 +08:00

Introduction

This module includes TPC-DS data generator and benchmark tool.

How to use

package jar with following command: ./build/mvn clean package -Ptpcds -pl dev/kyuubi-tpcds -am

Data Generator

Support options:

key default description
db default the database to write data
scaleFactor 1 the scale factor of TPC-DS
format parquet the format of table to store data
parallel scaleFactor * 2 the parallelism of Spark job

Example: the following command to generate 10GB data with new database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.DataGenerator \
  kyuubi-tpcds_*.jar \
  --db tpcds_sf10 --scaleFactor 10 --format parquet --parallel 20

Benchmark Tool

Support options:

key default description
db none(required) the TPC-DS database
benchmark tpcds-v2.4-benchmark the name of application
iterations 3 the number of iterations to run
breakdown false whether to record breakdown results of an execution
results-dir /spark/sql/performance dir to store benchmark results, e.g. hdfs://hdfs-nn:9870/pref
include none(optional) name of the queries to run, use comma to split multiple names, e.g. q1,q2
exclude none(optional) name of the queries to exclude, use comma to split multiple names, e.g. q2,q4

Example: the following command to benchmark TPC-DS sf10 with exists database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10

We also support run specified SQL collections of the TPC-DS query:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --include q1,q2

The result of TPC-DS benchmark like:

name minTimeMs maxTimeMs avgTimeMs stdDev stdDevPercent
q1-v2.4 8329.884508 14159.307004 10537.235825 3161.74253777417 30.0054263782615
q2-v2.4 16600.979609 18932.613523 18137.6516166666 1331.06332796139 7.33867512781137

If you want to exclude some SQL, you can use exclude:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --exclude q2,q4

The result of TPC-DS benchmark like:

name minTimeMs maxTimeMs avgTimeMs stdDev stdDevPercent
q1-v2.4 8329.884508 14159.307004 10537.235825 3161.74253777417 30.0054263782615
q3-v2.4 3841.009061 4685.16345 4128.583224 482.102016761038 11.6771781166603
q5-v2.4 39405.654981 48845.359253 43530.6847113333 4830.98802198401 11.0978911864583
q6-v2.4 2998.962221 7793.096796 4658.37355366666 2716.310089792 58.3102677039276
... ... ... ... ... ...
q99-v2.4 11747.22389 11900.570288 11813.018609 78.9544389266673 0.668368022941351

When both include and exclude exist simultaneously, the final SQL collections executed is include minus exclude:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --include q1,q2,q3,q4,q5 --exclude q2,q4

The result of TPC-DS benchmark like:

name minTimeMs maxTimeMs avgTimeMs stdDev stdDevPercent
q1-v2.4 8329.884508 14159.307004 10537.235825 3161.74253777417 30.0054263782615
q3-v2.4 3841.009061 4685.16345 4128.583224 482.102016761038 11.6771781166603
q5-v2.4 39405.654981 48845.359253 43530.6847113333 4830.98802198401 11.0978911864583