This patch extracts `Query` into its own top-level class and makes its `sparkContext` field transient in order to fix `NotSerializableException`s.
Author: Josh Rosen <rosenville@gmail.com>
Closes#53 from JoshRosen/make-query-into-top-level-class.
This patch adds additional constructors to `TPCDS` to maintain backwards-compatibility with code which calls `new TPCDS(anExistingSqlContext)`. This constructor was removed in #47.
The motivation for backwards-compatibility here is to simplify the gradual roll-out of an updated spark-sql-perf library to some existing jobs which share the same notebook.
Author: Josh Rosen <rosenville@gmail.com>
Closes#52 from JoshRosen/backwards-compatible-tpcds-constructor.
- Scripts for running the benchmark either while working on spark-sql-perf (bin/run) or while working on Spark (bin/spark-perf). The latter uses Spark's sbt build to compile spark and downloads the most recent published version of spark-sql-perf.
- Adds a `--compare` that can be used to compare the results with a baseline run
Author: Michael Armbrust <michael@databricks.com>
Closes#49 from marmbrus/runner.
This PR adds the ability to run performance test locally as a stand alone program that reports the results to the console:
```
$ bin/run --help
spark-sql-perf 0.2.0
Usage: spark-sql-perf [options]
-b <value> | --benchmark <value>
the name of the benchmark to run
-f <value> | --filter <value>
a filter on the name of the queries to run
-i <value> | --iterations <value>
the number of iterations to run
--help
prints this usage text
$ bin/run --benchmark DatasetPerformance
```
Author: Michael Armbrust <michael@databricks.com>
Closes#47 from marmbrus/MainClass.
After this you should be able to use the library in the shell as follows:
```
bin/spark-shell --packages com.databricks:spark-sql-perf:0.2.3
```
Author: Michael Armbrust <michael@databricks.com>
Closes#46 from marmbrus/publishToMaven.
In Spark 1.6, if a value is null, `getLong` will throw an exception. Before 1.6, it will return 0. With this PR, we will check if the result is null. If it is null, null will be returned instead of 0.
Author: Yin Huai <yhuai@databricks.com>
Closes#41 from yhuai/fixSumHash.
The changes in this PR are centered around making `Benchmark#runExperiment` accept things other than `Query`s. In particular, in spark-perf we don't always have a DataFrame or an RDD to work with and may want to run arbitrary code (e.g. ALS.train). This PR makes it possible to use the same code in `Benchmark` to do this.
I tested this on dogfood and it works well there.
Author: Andrew Or <andrew@databricks.com>
Closes#33 from andrewor14/spark-perf.
This allows us to report Spark perf results in the same format as SQL benchmark results. marmbrus
Author: Andrew Or <andrew@databricks.com>
Closes#30 from andrewor14/spark-perf.
Some internal interfaces changed, so we need to bump the Spark version to run tests on Spark 1.6.
Author: Michael Armbrust <michael@databricks.com>
Closes#29 from marmbrus/spark16.