History

liangbowen 4213e20945 [KYUUBI #5177 ] Use Scala binary version placeholder in Maven module's artifactId suffix ### _Why are the changes needed?_ - Change hardcoded Scala's version 2.12 in Maven module's `artifactId` to placeholder `scala.binary.version` which is defined in project parent pom as 2.12 - Preparation for Scala 2.13/3.x support in the future - No impact on using or building Maven modules - Some ignorable warning messages for unstable artifactId will be thrown by Maven. ``` Warning: Some problems were encountered while building the effective model for org.apache.kyuubi:kyuubi-server_2.12🫙1.8.0-SNAPSHOT Warning: 'artifactId' contains an expression but should be a constant ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No. Closes #5175 from bowenliang123/artifactId-scala. Closes #5177 2eba29cfa [liangbowen] use placeholder of scala binary version for artifactId Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: Cheng Pan <chengpan@apache.org>		2023-08-20 16:03:23 +00:00
..
src/main	[KYUUBI #3539 ] [FEATURE][TPCDS] Add white list help run the specified queries	2022-09-27 20:08:31 +08:00
pom.xml	[KYUUBI #5177 ] Use Scala binary version placeholder in Maven module's artifactId suffix	2023-08-20 16:03:23 +00:00
README.md	[KYUUBI #4312 ] [DOCS] Include `**/README.md` in markdown style check	2023-02-14 02:23:32 +08:00

README.md

Introduction

This module includes TPC-DS data generator and benchmark tool.

How to use

package jar with following command: ./build/mvn clean package -Ptpcds -pl dev/kyuubi-tpcds -am

Data Generator

Support options:

key	default	description
db	default	the database to write data
scaleFactor	1	the scale factor of TPC-DS
format	parquet	the format of table to store data
parallel	scaleFactor * 2	the parallelism of Spark job

Example: the following command to generate 10GB data with new database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.DataGenerator \
  kyuubi-tpcds_*.jar \
  --db tpcds_sf10 --scaleFactor 10 --format parquet --parallel 20

Benchmark Tool

Support options:

key	default	description
db	none(required)	the TPC-DS database
benchmark	tpcds-v2.4-benchmark	the name of application
iterations	3	the number of iterations to run
breakdown	false	whether to record breakdown results of an execution
filter	a	filter on the name of the queries to run, e.g. q1-v2.4
results-dir	/spark/sql/performance	dir to store benchmark results, e.g. hdfs://hdfs-nn:9870/pref

Example: the following command to benchmark TPC-DS sf10 with exists database tpcds_sf10.

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10

We also support run one of the TPC-DS query:

$SPARK_HOME/bin/spark-submit \
  --class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \
  kyuubi-tpcds_*.jar --db tpcds_sf10 --filter q1-v2.4

The result of TPC-DS benchmark like:

name	minTimeMs	maxTimeMs	avgTimeMs	stdDev	stdDevPercent
q1-v2.4	50.522384	868.010383	323.398267	471.6482	145.8413108576