spark-sql-perf

Author	SHA1	Message	Date
Yuming Wang	65785a8a04	Fix Travis CI JDK installation (#195 ) * Replace oraclejdk8 with openjdk8 * Update .travis.yml	2021-01-28 17:28:46 +01:00
Nico Poggi	d85f75bb38	Update for Spark 3.0.0 compatibility (#191 ) * Updating the build file to spark 3.0.0 and scala 2.12.10 * Fixing incompatibilities * Adding default parameters to newer required functions * Removing HiveTest	2020-11-03 15:27:34 +01:00
Luca Canali	e1e1365a87	Updates for Spark 3.0 and Scala 2.12 compatibility (#176 ) * Refactor deprecated `getOrCreate()` in spark 3 * Compile with scala 2.12 * Updated usage related to obsolete/deprecated features * remove use of scala-logging replaced by using slf4j directly	2019-01-29 09:58:52 +01:00
Bago Amirbekian	85bbfd4ca2	[ML-5437] Build with spark-2.4.0 and resolve build issues (#174 ) We made some changes to related to new APIs in spark2.4. These APIs were reverted because they were breaking changes so we need to revert our changes.	2018-11-09 16:21:22 -08:00
Nico Poggi	d44caec277	Revert "Update Scala Logging to officially supported one " (#172 ) Reverts #157 due to library errors when the previous was is in the classpath already (i.e., in databricks) and not bringing any noted improvements or needed fixes. Exception: java.lang.InstantiationError: com.typesafe.scalalogging.Logger This reverts commit `56f7348`.	2018-10-19 17:33:34 +02:00
Piotr Mrówczyński	56f73482d7	Update Scala Logging to officially supported one	2018-09-11 12:17:06 +02:00
Xiangrui Meng	bb12958874	Fix compile for Spark 2.4 SNAPSHOT and only catch NonFatal (#164 ) * only catch non-fatal exceptions * remove afterBenchmark for MLlib * fix compile * use Apache snapshot releases	2018-09-10 08:49:31 -07:00
ludatabricks	e8aa132bb8	[ML-3870] Make spark-sql-perf master compiled with spark 2.3 and scala 2.11 (#155 ) Change the build config to update spark 2.3 and update the scala dependence in bin/spark-perf	2018-06-15 06:40:14 -07:00
Siddharth Murching	d0de5ae8aa	Update tests to run with Spark 2.2, add NaiveBayes & Bucketizer ML tests (#110 ) * Made small updates in Benchmark.scala and Query.scala for Spark 2.2 * Added tests for NaiveBayesModel and Bucketizer * Changed BenchmarkAlgorithm.getEstimator() -> BenchmarkAlgorithm.getPipelineStage() to allow for the benchmarking of Estimators and Transformers instead of just Estimators Commits: * Changes made so that spark-sql-perf compiles with Spark 2.2 * Updates for running ML tests from the command line + added Naive Bayes test * Add Bucketizer test as example of Featurizer test; change getEstimator() to getPipelineStage() in BenchmarkAlgorithm to allow for testing of transformers in addition to estimators. * Add comment for main method in MLlib.scala * Rename MLTransformerBenchmarkable --> MLPipelineStageBenchmarkable, fix issue with NaiveBayes param * Add UnaryTransformer trait for common data/methods to be shared across all objects testing featurizers that operate on a single column (StringIndexer, OneHotEncoder, Bucketizer, HashingTF, etc) * Respond to review comments: * bin/run-ml: Add newline at EOF * Query.scala: organized imports * MLlib.scala: organized imports, fixed SparkContext initialization * NaiveBayes.scala: removed unused temp val, improved probability calculation in trueModel() * Bucketizer.scala: use DataGenerator.generateContinuousFeatures instead of generating data on the driver * Fix bug in Bucketizer.scala * Precompute log of sum of unnormalized probabilities in NaiveBayes.scala, add NaiveBayes and Bucketizer tests to mllib-small.yaml * Update Query.scala to use p() to access SparkPlans under a given SparkPlan * Update README to indicate that spark-sql-perf only works with Spark 2.2+ after this PR	2017-08-21 15:07:46 -07:00
Eric Liang	64728c7cff	Add option to avoid cleaning after each run, to enable parallel runs	2017-03-14 19:45:27 -07:00
Timothy Hunter	53091a1935	Removes labels from tree data generation (#82 ) * changes * removes labels * reset scala version * adding metadata * bumping spark release	2016-12-13 16:47:31 -08:00
srinathshankar	685c50d9dc	Cross build with Scala 2.11 (#91 ) * Cross build with Scala 2.11 * Update snapshot version	2016-10-03 17:01:17 -07:00
Josh Rosen	c2224f37e5	Depend on non-snapshot Spark now that 2.0.0 is released Now that Spark 2.0.0 is released, we need to update the build to use a released version instead of the snapshot (which is no longer available). Fixes #84. Author: Josh Rosen <joshrosen@databricks.com> Closes #85 from JoshRosen/fix-spark-dep.	2016-08-17 17:53:30 -07:00
Timothy Hunter	948c8369e7	Fixes issues with scala 2.11 Updates the usual scala-logging issues to make the source code cross-compilable between scala 2.10 and scala 2.11. Tests: A scala 2.11 version of the code has been run against the official Spark 2.0.0 RC4 binary release (Scala 2.11) A scala 2.10 version has been run against the official Spark 1.6.2 release Author: Timothy Hunter <timhunter@databricks.com> Closes #81 from thunterdb/1607-scala211.	2016-07-19 11:19:52 -07:00
Timothy Hunter	1388722b81	Initial commit for adding MLlib reporting in spark-sql-perf This PR adds basic MLlib infrastructure to run some benchmarks against ML pipelines. There are 2 ways to describe and run ML pipelines: - programatically, in scala (see MLBenchmarks.scala) - using a simple YAML file (see mllib-small.yaml for an example) The YAML approach is preferred because it generates programmatically the cartesian product of all the experiments to run and validates the types of the objects in the yaml file. In both cases, all the ML experiments are standard benchmarks. This PR also moves some code in `Benchmark.scala` : the current code generates path-dependent structural signatures and confuses intellij. It does not include tests, but some small benchmarks can be run locally against a spark 2 installation: ``` $SPARK_HOME/bin/spark-shell --jars $PWD/target/scala-2.10/spark-sql-perf-assembly-0.4.9-SNAPSHOT.jar ``` and then: ```scala com.databricks.spark.sql.perf.mllib.MLLib.run(yamlFile="src/main/scala/configs/mllib-small.yaml") ``` Author: Timothy Hunter <timhunter@databricks.com> Closes #69 from thunterdb/1605-mllib2.	2016-06-22 16:59:49 -07:00
Josh Rosen	7e38b77c50	Update to compile against Spark 2.0.0-SNAPSHOT and bump version to 0.4.0-SNAPSHOT Author: Josh Rosen <rosenville@gmail.com> Closes #51 from JoshRosen/spark-2.0.0.	2016-02-19 13:02:29 -08:00
Michael Armbrust	9d3347e949	Improvements to running the benchmark - Scripts for running the benchmark either while working on spark-sql-perf (bin/run) or while working on Spark (bin/spark-perf). The latter uses Spark's sbt build to compile spark and downloads the most recent published version of spark-sql-perf. - Adds a `--compare` that can be used to compare the results with a baseline run Author: Michael Armbrust <michael@databricks.com> Closes #49 from marmbrus/runner.	2016-01-24 20:24:54 -08:00
Michael Armbrust	43f7457d03	Add required developer info to pom	2016-01-19 13:03:31 -08:00
Michael Armbrust	9afabf249a	remove sql dependency	2016-01-19 12:52:03 -08:00
Michael Armbrust	663ca7560e	Main Class for running Benchmarks from the command line This PR adds the ability to run performance test locally as a stand alone program that reports the results to the console: ``` $ bin/run --help spark-sql-perf 0.2.0 Usage: spark-sql-perf [options] -b <value> \| --benchmark <value> the name of the benchmark to run -f <value> \| --filter <value> a filter on the name of the queries to run -i <value> \| --iterations <value> the number of iterations to run --help prints this usage text $ bin/run --benchmark DatasetPerformance ``` Author: Michael Armbrust <michael@databricks.com> Closes #47 from marmbrus/MainClass.	2016-01-19 12:37:51 -08:00
Michael Armbrust	5c93fff323	Upgrade to 1.6 Author: Michael Armbrust <michael@databricks.com> Closes #48 from marmbrus/upgrade.	2016-01-18 09:11:35 -08:00
Michael Armbrust	7825449eef	Include publishing to BinTray in release process After this you should be able to use the library in the shell as follows: ``` bin/spark-shell --packages com.databricks:spark-sql-perf:0.2.3 ``` Author: Michael Armbrust <michael@databricks.com> Closes #46 from marmbrus/publishToMaven.	2015-12-23 00:09:35 -08:00
Michael Armbrust	f8aa93d968	Initial set of tests for Datasets Author: Michael Armbrust <michael@databricks.com> Closes #42 from marmbrus/dataset-tests.	2015-12-08 16:04:42 -08:00
Michael Armbrust	e516e1e7b3	Use published preview release of 1.6 Author: Michael Armbrust <michael@databricks.com> Closes #32 from marmbrus/spark16.	2015-11-16 22:46:36 -08:00
Michael Armbrust	344b31ed69	Update to Spark 1.6 Some internal interfaces changed, so we need to bump the Spark version to run tests on Spark 1.6. Author: Michael Armbrust <michael@databricks.com> Closes #29 from marmbrus/spark16.	2015-11-13 12:40:00 -08:00
Michael Armbrust	8b441c1ee2	Update build.sbt	2015-09-11 12:16:55 -07:00
Michael Armbrust	479e4081c2	Add a release process for pushing to DBC	2015-09-09 22:32:31 -07:00
Michael Armbrust	e046705e7f	update version	2015-08-24 16:14:17 -07:00
Michael Armbrust	98dd76befd	Release 0.1.1	2015-08-24 16:13:51 -07:00
Michael Armbrust	e5ac7f6b4a	update version 0.1.1-SNAPSHOT	2015-08-23 13:45:01 -07:00
Michael Armbrust	cabbf7291c	release 0.1	2015-08-23 13:44:23 -07:00
Michael Armbrust	00aa49e8e4	Add support for CPU Profiling.	2015-08-20 16:46:12 -07:00
Michael Armbrust	eba8cea93c	Basic join performance tests	2015-07-13 16:20:36 -07:00
Yin Huai	930751810e	Initial port.	2015-04-15 20:03:14 -07:00

34 Commits