<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Support tpcds benchmark in `dev/kyuubi-tpcds` module. Add a `README.md` in `dev/kyuubi-tpcds` module to show how to use. The mian code is from [databricks-spark-sql-perf](https://github.com/databricks/spark-sql-perf) ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #1496 from ulysses-you/tpcds-benchmark. Closes #1496 d4afe2d1 [ulysses-you] comment 54a146ef [ulysses-you] pom 91e71692 [ulysses-you] docs 20eadc49 [ulysses-you] benchmark Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: ulysses-you <ulyssesyou@apache.org> |
||
|---|---|---|
| .. | ||
| src/main | ||
| pom.xml | ||
| README.md | ||
Introduction
This module includes tpcds data generator and benchmark.
How to use
package jar with following command:
./build/mvn install -DskipTests -Ptpcds -pl dev/kyuubi-tpcds -am
data generator
Support options:
| key | default | description |
|---|---|---|
| db | default | the databases to write data |
| scaleFactor | 1 | the scale factor of tpcds |
Example: the following command to generate 10GB data with new database tpcds_sf10.
$SPARK_HOME/bin/spark-submit \
--class org.apache.kyuubi.tpcds.DataGenerator \
kyuubi-tpcds-*.jar --db tpcds_sf10 --scaleFactor 10
do benchmark
Support options:
| key | default | description |
|---|---|---|
| db | none(required) | the tpcds database |
| benchmark | tpcds-v2.4-benchmark | the name of application |
| iterations | 3 | the number of iterations to run |
| filter | a | filter on the name of the queries to run, e.g. q1-v2.4 |
Example: the following command to benchmark tpcds sf10 with exists database tpcds_sf10.
$SPARK_HOME/bin/spark-submit \
--class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
kyuubi-tpcds-*.jar --db tpcds_sf10
We also support run one of the tpcds query:
$SPARK_HOME/bin/spark-submit \
--class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
kyuubi-tpcds-*.jar --db tpcds_sf10 --filter q1-v2.4
The result of tpcds benchmark like:
| name | minTimeMs | maxTimeMs | avgTimeMs | stdDev | stdDevPercent |
|---|---|---|---|---|---|
| q1-v2.4 | 50.522384 | 868.010383 | 323.398267 | 471.6482 | 145.8413108576 |