spark-sql-perf

Author	SHA1	Message	Date
ludatabricks	e8aa132bb8	[ML-3870] Make spark-sql-perf master compiled with spark 2.3 and scala 2.11 (#155 ) Change the build config to update spark 2.3 and update the scala dependence in bin/spark-perf	2018-06-15 06:40:14 -07:00
Siddharth Murching	d0de5ae8aa	Update tests to run with Spark 2.2, add NaiveBayes & Bucketizer ML tests (#110 ) * Made small updates in Benchmark.scala and Query.scala for Spark 2.2 * Added tests for NaiveBayesModel and Bucketizer * Changed BenchmarkAlgorithm.getEstimator() -> BenchmarkAlgorithm.getPipelineStage() to allow for the benchmarking of Estimators and Transformers instead of just Estimators Commits: * Changes made so that spark-sql-perf compiles with Spark 2.2 * Updates for running ML tests from the command line + added Naive Bayes test * Add Bucketizer test as example of Featurizer test; change getEstimator() to getPipelineStage() in BenchmarkAlgorithm to allow for testing of transformers in addition to estimators. * Add comment for main method in MLlib.scala * Rename MLTransformerBenchmarkable --> MLPipelineStageBenchmarkable, fix issue with NaiveBayes param * Add UnaryTransformer trait for common data/methods to be shared across all objects testing featurizers that operate on a single column (StringIndexer, OneHotEncoder, Bucketizer, HashingTF, etc) * Respond to review comments: * bin/run-ml: Add newline at EOF * Query.scala: organized imports * MLlib.scala: organized imports, fixed SparkContext initialization * NaiveBayes.scala: removed unused temp val, improved probability calculation in trueModel() * Bucketizer.scala: use DataGenerator.generateContinuousFeatures instead of generating data on the driver * Fix bug in Bucketizer.scala * Precompute log of sum of unnormalized probabilities in NaiveBayes.scala, add NaiveBayes and Bucketizer tests to mllib-small.yaml * Update Query.scala to use p() to access SparkPlans under a given SparkPlan * Update README to indicate that spark-sql-perf only works with Spark 2.2+ after this PR	2017-08-21 15:07:46 -07:00
Michael Armbrust	9d3347e949	Improvements to running the benchmark - Scripts for running the benchmark either while working on spark-sql-perf (bin/run) or while working on Spark (bin/spark-perf). The latter uses Spark's sbt build to compile spark and downloads the most recent published version of spark-sql-perf. - Adds a `--compare` that can be used to compare the results with a baseline run Author: Michael Armbrust <michael@databricks.com> Closes #49 from marmbrus/runner.	2016-01-24 20:24:54 -08:00
Michael Armbrust	663ca7560e	Main Class for running Benchmarks from the command line This PR adds the ability to run performance test locally as a stand alone program that reports the results to the console: ``` $ bin/run --help spark-sql-perf 0.2.0 Usage: spark-sql-perf [options] -b <value> \| --benchmark <value> the name of the benchmark to run -f <value> \| --filter <value> a filter on the name of the queries to run -i <value> \| --iterations <value> the number of iterations to run --help prints this usage text $ bin/run --benchmark DatasetPerformance ``` Author: Michael Armbrust <michael@databricks.com> Closes #47 from marmbrus/MainClass.	2016-01-19 12:37:51 -08:00

4 Commits