History

Bowen Liang d3520ddbce [KYUUBI #6769 ] [RELEASE] Bump 1.11.0-SNAPSHOT # 🔍 Description ## Issue References 🔗 This pull request fixes # ## Describe Your Solution 🔧 Preparing v1.11.0-SNAPSHOT after branch-1.10 cut ```shell build/mvn versions:set -DgenerateBackupPoms=false -DnewVersion="1.11.0-SNAPSHOT" (cd kyuubi-server/web-ui && npm version "1.11.0-SNAPSHOT") ``` ## Types of changes 🔖 - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) Be nice. Be informative. Closes #6769 from bowenliang123/bump-1.11. Closes #6769 6db219d28 [Bowen Liang] get latest_branch by sorting version in branch name 465276204 [Bowen Liang] update package.json 81f2865e5 [Bowen Liang] bump Authored-by: Bowen Liang <liangbowen@gf.com.cn> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>		2024-10-23 17:10:56 +08:00
..
src/main	[KYUUBI #5925 ] Kyuubi TPC-DS support running benchmark with skipping some queries	2023-12-28 17:01:46 +08:00
pom.xml	[KYUUBI #6769 ] [RELEASE] Bump 1.11.0-SNAPSHOT	2024-10-23 17:10:56 +08:00
README.md	[KYUUBI #5925 ] Kyuubi TPC-DS support running benchmark with skipping some queries	2023-12-28 17:01:46 +08:00

README.md

Introduction

This module includes TPC-DS data generator and benchmark tool.

How to use

package jar with following command: ./build/mvn clean package -Ptpcds -pl dev/kyuubi-tpcds -am

Data Generator

Support options:

key	default	description
db	default	the database to write data
scaleFactor	1	the scale factor of TPC-DS
format	parquet	the format of table to store data
parallel	scaleFactor * 2	the parallelism of Spark job

Example: the following command to generate 10GB data with new database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.DataGenerator \
  kyuubi-tpcds_*.jar \
  --db tpcds_sf10 --scaleFactor 10 --format parquet --parallel 20

Benchmark Tool

Support options:

key	default	description
db	none(required)	the TPC-DS database
benchmark	tpcds-v2.4-benchmark	the name of application
iterations	3	the number of iterations to run
breakdown	false	whether to record breakdown results of an execution
results-dir	/spark/sql/performance	dir to store benchmark results, e.g. hdfs://hdfs-nn:9870/pref
include	none(optional)	name of the queries to run, use comma to split multiple names, e.g. q1,q2
exclude	none(optional)	name of the queries to exclude, use comma to split multiple names, e.g. q2,q4

Example: the following command to benchmark TPC-DS sf10 with exists database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10

We also support run specified SQL collections of the TPC-DS query:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --include q1,q2

The result of TPC-DS benchmark like:

name	minTimeMs	maxTimeMs	avgTimeMs	stdDev	stdDevPercent
q1-v2.4	8329.884508	14159.307004	10537.235825	3161.74253777417	30.0054263782615
q2-v2.4	16600.979609	18932.613523	18137.6516166666	1331.06332796139	7.33867512781137

If you want to exclude some SQL, you can use exclude:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --exclude q2,q4

The result of TPC-DS benchmark like:

name	minTimeMs	maxTimeMs	avgTimeMs	stdDev	stdDevPercent
q1-v2.4	8329.884508	14159.307004	10537.235825	3161.74253777417	30.0054263782615
q3-v2.4	3841.009061	4685.16345	4128.583224	482.102016761038	11.6771781166603
q5-v2.4	39405.654981	48845.359253	43530.6847113333	4830.98802198401	11.0978911864583
q6-v2.4	2998.962221	7793.096796	4658.37355366666	2716.310089792	58.3102677039276
...	...	...	...	...	...
q99-v2.4	11747.22389	11900.570288	11813.018609	78.9544389266673	0.668368022941351

When both include and exclude exist simultaneously, the final SQL collections executed is include minus exclude:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --include q1,q2,q3,q4,q5 --exclude q2,q4

The result of TPC-DS benchmark like:

name	minTimeMs	maxTimeMs	avgTimeMs	stdDev	stdDevPercent
q1-v2.4	8329.884508	14159.307004	10537.235825	3161.74253777417	30.0054263782615
q3-v2.4	3841.009061	4685.16345	4128.583224	482.102016761038	11.6771781166603
q5-v2.4	39405.654981	48845.359253	43530.6847113333	4830.98802198401	11.0978911864583