kyuubi/dev/kyuubi-tpcds
ulysses-you 37a4e5c0da
[KYUUBI #1496] Support tpcds benchmark
<!--
Thanks for sending a pull request!

Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
  2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->

### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
  1. If you add a feature, you can talk about the use case of it.
  2. If you fix a bug, you can clarify why it is a bug.
-->
Support tpcds benchmark in `dev/kyuubi-tpcds` module.

Add a `README.md` in `dev/kyuubi-tpcds` module to show how to use.

The mian code is from [databricks-spark-sql-perf](https://github.com/databricks/spark-sql-perf)

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #1496 from ulysses-you/tpcds-benchmark.

Closes #1496

d4afe2d1 [ulysses-you] comment
54a146ef [ulysses-you] pom
91e71692 [ulysses-you] docs
20eadc49 [ulysses-you] benchmark

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
2021-12-06 20:08:08 +08:00
..
src/main [KYUUBI #1496] Support tpcds benchmark 2021-12-06 20:08:08 +08:00
pom.xml [KYUUBI #1496] Support tpcds benchmark 2021-12-06 20:08:08 +08:00
README.md [KYUUBI #1496] Support tpcds benchmark 2021-12-06 20:08:08 +08:00

Introduction

This module includes tpcds data generator and benchmark.

How to use

package jar with following command: ./build/mvn install -DskipTests -Ptpcds -pl dev/kyuubi-tpcds -am

data generator

Support options:

key default description
db default the databases to write data
scaleFactor 1 the scale factor of tpcds

Example: the following command to generate 10GB data with new database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.DataGenerator \
  kyuubi-tpcds-*.jar --db tpcds_sf10 --scaleFactor 10

do benchmark

Support options:

key default description
db none(required) the tpcds database
benchmark tpcds-v2.4-benchmark the name of application
iterations 3 the number of iterations to run
filter a filter on the name of the queries to run, e.g. q1-v2.4

Example: the following command to benchmark tpcds sf10 with exists database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds-*.jar --db tpcds_sf10

We also support run one of the tpcds query:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds-*.jar --db tpcds_sf10 --filter q1-v2.4

The result of tpcds benchmark like:

name minTimeMs maxTimeMs avgTimeMs stdDev stdDevPercent
q1-v2.4 50.522384 868.010383 323.398267 471.6482 145.8413108576