* Made small updates in Benchmark.scala and Query.scala for Spark 2.2 * Added tests for NaiveBayesModel and Bucketizer * Changed BenchmarkAlgorithm.getEstimator() -> BenchmarkAlgorithm.getPipelineStage() to allow for the benchmarking of Estimators and Transformers instead of just Estimators Commits: * Changes made so that spark-sql-perf compiles with Spark 2.2 * Updates for running ML tests from the command line + added Naive Bayes test * Add Bucketizer test as example of Featurizer test; change getEstimator() to getPipelineStage() in BenchmarkAlgorithm to allow for testing of transformers in addition to estimators. * Add comment for main method in MLlib.scala * Rename MLTransformerBenchmarkable --> MLPipelineStageBenchmarkable, fix issue with NaiveBayes param * Add UnaryTransformer trait for common data/methods to be shared across all objects testing featurizers that operate on a single column (StringIndexer, OneHotEncoder, Bucketizer, HashingTF, etc) * Respond to review comments: * bin/run-ml: Add newline at EOF * Query.scala: organized imports * MLlib.scala: organized imports, fixed SparkContext initialization * NaiveBayes.scala: removed unused temp val, improved probability calculation in trueModel() * Bucketizer.scala: use DataGenerator.generateContinuousFeatures instead of generating data on the driver * Fix bug in Bucketizer.scala * Precompute log of sum of unnormalized probabilities in NaiveBayes.scala, add NaiveBayes and Bucketizer tests to mllib-small.yaml * Update Query.scala to use p() to access SparkPlans under a given SparkPlan * Update README to indicate that spark-sql-perf only works with Spark 2.2+ after this PR
9 lines
243 B
Bash
Executable File
9 lines
243 B
Bash
Executable File
#!/bin/bash
|
|
|
|
# runs spark-sql-perf ML tests from the current directory
|
|
# accepts a single command-line argument; the path to a YAML configuration file
|
|
# specifying the tests to be run & their params
|
|
|
|
ARGS="runMLBenchmark $@"
|
|
build/sbt "$ARGS"
|