### _Why are the changes needed?_ This pr is a follow-up, aims to fix CI failure due to https://github.com/apache/incubator-kyuubi/pull/3299#discussion_r953265471 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #3319 from Yikf/master. Closes #3298 d0cba33f [yikf] [KYUUBI #3298][FOLLOWUP] Fix CI failure Authored-by: yikf <yikaifei1@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> |
||
|---|---|---|
| .. | ||
| src/main | ||
| pom.xml | ||
| README.md | ||
Introduction
This module includes TPC-DS data generator and benchmark tool.
How to use
package jar with following command:
./build/mvn clean package -Ptpcds -pl dev/kyuubi-tpcds -am
Data Generator
Support options:
| key | default | description |
|---|---|---|
| db | default | the database to write data |
| scaleFactor | 1 | the scale factor of TPC-DS |
| format | parquet | the format of table to store data |
| parallel | scaleFactor * 2 | the parallelism of Spark job |
Example: the following command to generate 10GB data with new database tpcds_sf10.
$SPARK_HOME/bin/spark-submit \
--class org.apache.kyuubi.tpcds.DataGenerator \
kyuubi-tpcds_*.jar \
--db tpcds_sf10 --scaleFactor 10 --format parquet --parallel 20
Benchmark Tool
Support options:
| key | default | description |
|---|---|---|
| db | none(required) | the TPC-DS database |
| benchmark | tpcds-v2.4-benchmark | the name of application |
| iterations | 3 | the number of iterations to run |
| breakdown | false | whether to record breakdown results of an execution |
| filter | a | filter on the name of the queries to run, e.g. q1-v2.4 |
| results-dir | /spark/sql/performance | dir to store benchmark results, e.g. hdfs://hdfs-nn:9870/pref |
Example: the following command to benchmark TPC-DS sf10 with exists database tpcds_sf10.
$SPARK_HOME/bin/spark-submit \
--class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
kyuubi-tpcds_*.jar --db tpcds_sf10
We also support run one of the TPC-DS query:
$SPARK_HOME/bin/spark-submit \
--class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
kyuubi-tpcds_*.jar --db tpcds_sf10 --filter q1-v2.4
The result of TPC-DS benchmark like:
| name | minTimeMs | maxTimeMs | avgTimeMs | stdDev | stdDevPercent |
|---|---|---|---|---|---|
| q1-v2.4 | 50.522384 | 868.010383 | 323.398267 | 471.6482 | 145.8413108576 |