spark-sql-perf

Author	SHA1	Message	Date
Nico Poggi	6136ecea6e	TPC-H datagenerator and instructions (#136 ) * Adding basic partitioning to TPCH tables following VectorH paper as baseline * Multi datagen (TPC- H and DS) and multi scale factor notebook/script. Generates all the selected scale factors and benchmarks in one run. * TPCH runner notebook or script for spark-shell * Adding basic TPCH documentation	2018-09-10 23:18:33 +02:00
Juliusz Sompolski	91604a3ab0	Update README to specify that TPCDS kit needs to be installed on all nodes.	2018-02-27 12:06:12 +01:00
Juliusz Sompolski	31f34beee5	Update README to do sql("use database") (#123 )	2017-11-07 20:38:26 +01:00
Juliusz Sompolski	5ebb9cfb12	add some more comments	2017-09-12 16:51:26 +02:00
Juliusz Sompolski	c78f2b3a9b	update readme	2017-09-12 16:40:23 +02:00
Siddharth Murching	d0de5ae8aa	Update tests to run with Spark 2.2, add NaiveBayes & Bucketizer ML tests (#110 ) * Made small updates in Benchmark.scala and Query.scala for Spark 2.2 * Added tests for NaiveBayesModel and Bucketizer * Changed BenchmarkAlgorithm.getEstimator() -> BenchmarkAlgorithm.getPipelineStage() to allow for the benchmarking of Estimators and Transformers instead of just Estimators Commits: * Changes made so that spark-sql-perf compiles with Spark 2.2 * Updates for running ML tests from the command line + added Naive Bayes test * Add Bucketizer test as example of Featurizer test; change getEstimator() to getPipelineStage() in BenchmarkAlgorithm to allow for testing of transformers in addition to estimators. * Add comment for main method in MLlib.scala * Rename MLTransformerBenchmarkable --> MLPipelineStageBenchmarkable, fix issue with NaiveBayes param * Add UnaryTransformer trait for common data/methods to be shared across all objects testing featurizers that operate on a single column (StringIndexer, OneHotEncoder, Bucketizer, HashingTF, etc) * Respond to review comments: * bin/run-ml: Add newline at EOF * Query.scala: organized imports * MLlib.scala: organized imports, fixed SparkContext initialization * NaiveBayes.scala: removed unused temp val, improved probability calculation in trueModel() * Bucketizer.scala: use DataGenerator.generateContinuousFeatures instead of generating data on the driver * Fix bug in Bucketizer.scala * Precompute log of sum of unnormalized probabilities in NaiveBayes.scala, add NaiveBayes and Bucketizer tests to mllib-small.yaml * Update Query.scala to use p() to access SparkPlans under a given SparkPlan * Update README to indicate that spark-sql-perf only works with Spark 2.2+ after this PR	2017-08-21 15:07:46 -07:00
Kevin	fdcde7595c	Update README (#107 ) Little update for the README	2017-07-13 10:45:24 +02:00
Michael Armbrust	663ca7560e	Main Class for running Benchmarks from the command line This PR adds the ability to run performance test locally as a stand alone program that reports the results to the console: ``` $ bin/run --help spark-sql-perf 0.2.0 Usage: spark-sql-perf [options] -b <value> \| --benchmark <value> the name of the benchmark to run -f <value> \| --filter <value> a filter on the name of the queries to run -i <value> \| --iterations <value> the number of iterations to run --help prints this usage text $ bin/run --benchmark DatasetPerformance ``` Author: Michael Armbrust <michael@databricks.com> Closes #47 from marmbrus/MainClass.	2016-01-19 12:37:51 -08:00
Davies Liu	cec648ac0f	try to run all TPCDS queries in benchmark (even can't be parsed)	2016-01-08 15:03:44 -08:00
Nong Li	1aa5bfc838	Add remaining tpcds tables. Author: Nong Li <nongli@gmail.com> Closes #34 from nongli/tpcds.	2015-11-19 13:50:00 -08:00
Cheng Lian	50808c436b	Fixes typos in README.md Author: Cheng Lian <lian@databricks.com> Closes #25 from liancheng/readme-fix.	2015-11-11 12:05:44 -08:00
Michael Armbrust	ddeead18ce	Add compilation testing with travis There are no tests yet... but this at least tests compilation. Author: Michael Armbrust <michael@databricks.com> Closes #15 from marmbrus/travis.	2015-09-09 21:36:26 -07:00
Yin Huai	34f66a0a10	Add a option of filter rows with null partition column values.	2015-08-26 11:14:19 -07:00
Yin Huai	06eb11f326	Fix the seed to 100 and use distribute by instead of order by.	2015-08-25 20:44:14 -07:00
Yin Huai	9936d49239	Add a option to orderBy partition columns.	2015-08-25 20:44:14 -07:00
Yin Huai	58188c6711	Allow users to use double instead of decimal for generated tables.	2015-08-25 20:44:14 -07:00
Yin Huai	88aadb45a4	Update README.	2015-08-25 20:44:14 -07:00
Yin Huai	97093a45cd	Update readme and register temp tables.	2015-08-25 20:44:13 -07:00
Michael Armbrust	a239da90a2	more cleanup, update readme	2015-08-11 15:51:34 -07:00
Yin Huai	fb9939b136	includeBreakdown is a parameter of runExperiment.	2015-04-20 10:03:41 -07:00
Yin Huai	6c5657b609	Refactoring and doc.	2015-04-16 18:10:57 -07:00
Yin Huai	930751810e	Initial port.	2015-04-15 20:03:14 -07:00
Yin Huai	e81669ab3b	Initial commit.	2015-04-15 20:02:32 -07:00

23 Commits