spark-sql-perf

Author	SHA1	Message	Date
Kevin	fdcde7595c	Update README (#107 ) Little update for the README	2017-07-13 10:45:24 +02:00
Juliusz Sompolski	6488d74d23	tpcds_2_4: Add alias names to subqueries in FROM clause. ## What changes were proposed in this pull request? Since SPARK-20690 and SPARK-20916 Spark requires all subqueries in FROM clause to have an alias name. ## How was this patch tested? Tested on SF1.	2017-06-29 16:04:08 +02:00
Juliusz Sompolski	bff6b34f62	Tweaks and improvements (#106 ) Data generation: * Add an option to change Dates to Strings, and specify it in Tables object creator. * Add discovering partitions to createExternalTables * Add analyzeTables function that gathers statistics. Benchmark execution: * Perform collect() on Dataframe, so that it is recorded by SQL SparkUI.	2017-06-13 11:42:14 +02:00
Juliusz Sompolski	75f3876e59	Merge pull request #103 from juliuszsompolski/fixtypes Correct types of keys in schema	2017-05-26 11:53:19 +02:00
Juliusz Sompolski	2ddd521ab5	ok, make it long only where really needed.	2017-05-26 10:36:40 +02:00
Juliusz Sompolski	1bca964a3d	Correct types of keys	2017-05-25 17:12:47 +02:00
Volodymyr Lyubinets	beec62844d	Merge pull request #101 from vlyubin/master Add tpcds 2.4 queries	2017-05-16 10:35:35 +02:00
vlyubin	c0bd21c2ec	Add ss_max	2017-05-16 10:29:00 +02:00
vlyubin	e5dc6f338f	Updated queries 23	2017-05-15 17:30:20 +02:00
vlyubin	e8f85b0b0e	Moved queries into a separate folder	2017-05-15 14:22:37 +02:00
vlyubin	96bf10bffc	Add tpcds 2.4 queries	2017-05-12 11:54:32 +02:00
Eric Liang	c12b14b013	Merge pull request #98 from databricks/parallel-runs Add option to avoid cleaning after each run, to enable parallel runs	2017-03-15 13:50:41 -07:00
Eric Liang	64728c7cff	Add option to avoid cleaning after each run, to enable parallel runs	2017-03-14 19:45:27 -07:00
Timothy Hunter	53091a1935	Removes labels from tree data generation (#82 ) * changes * removes labels * reset scala version * adding metadata * bumping spark release	2016-12-13 16:47:31 -08:00
srinathshankar	685c50d9dc	Cross build with Scala 2.11 (#91 ) * Cross build with Scala 2.11 * Update snapshot version	2016-10-03 17:01:17 -07:00
srinathshankar	0eaa4b1d57	[SC-4409] Correct query 41 in TPCDS kit (#90 )	2016-09-30 18:02:39 -07:00
Josh Rosen	c2224f37e5	Depend on non-snapshot Spark now that 2.0.0 is released Now that Spark 2.0.0 is released, we need to update the build to use a released version instead of the snapshot (which is no longer available). Fixes #84. Author: Josh Rosen <joshrosen@databricks.com> Closes #85 from JoshRosen/fix-spark-dep.	2016-08-17 17:53:30 -07:00
Timothy Hunter	948c8369e7	Fixes issues with scala 2.11 Updates the usual scala-logging issues to make the source code cross-compilable between scala 2.10 and scala 2.11. Tests: A scala 2.11 version of the code has been run against the official Spark 2.0.0 RC4 binary release (Scala 2.11) A scala 2.10 version has been run against the official Spark 1.6.2 release Author: Timothy Hunter <timhunter@databricks.com> Closes #81 from thunterdb/1607-scala211.	2016-07-19 11:19:52 -07:00
Timothy Hunter	8830bffd46	Merge pull request #79 from jkbradley/tree-test-fix Fixed tree, forest, GBT tests by adding metadata to DataFrames	2016-07-11 10:42:19 -07:00
Joseph K. Bradley	51469a34d6	Fixed tree, forest, GBT tests by adding metadata to DataFrames	2016-07-11 10:33:19 -07:00
Timothy Hunter	1fcc366cec	Merge pull request #78 from thunterdb/1607-fixes Adding parameters in case of failures	2016-07-06 11:34:05 -07:00
Timothy Hunter	c7d42d3626	adding parameters	2016-07-06 11:23:07 -07:00
Timothy Hunter	2672bcd5b7	ALS algorithm for spark-sql-perf This has been tested locally with a small amount of data. I have not bothered to reimplement a more robust version of the ALS synthetic data generation, so it will still require some manual parameter tweaking as before. Author: Timothy Hunter <timhunter@databricks.com> Closes #76 from thunterdb/1607-als.	2016-07-05 15:54:08 -07:00
Timothy Hunter	93c0407bbe	Merge pull request #77 from thunterdb/1607-linear Linear regression	2016-07-05 15:41:35 -07:00
Timothy Hunter	40e97ca3c0	comment	2016-07-05 15:01:50 -07:00
Timothy Hunter	ce7e20ae6d	set the solver	2016-07-05 13:46:19 -07:00
Timothy Hunter	def20479a1	linear regression	2016-07-05 13:42:56 -07:00
Timothy Hunter	979ebd5d0f	Merge pull request #75 from jkbradley/kmeans Added kmeans test	2016-07-05 10:14:11 -07:00
Joseph K. Bradley	9d11a601c3	added kmeans test	2016-07-01 18:00:49 -07:00
jkbradley	3d3443791c	Merge pull request #74 from jkbradley/dt-tests Decision tree, random forest, GBT classification perf tests	2016-07-01 17:40:16 -07:00
Joseph K. Bradley	495e2716c4	updated per code review. works in local tests	2016-07-01 17:39:28 -07:00
jkbradley	c2f0a35db4	Merge pull request #1 from thunterdb/1606-trees adding experiments to the yaml file	2016-07-01 11:46:41 -07:00
Timothy Hunter	813bd8ad59	adding more experiments	2016-07-01 10:34:42 -07:00
Joseph K. Bradley	c15d083fe7	cleanups	2016-06-30 10:45:15 -07:00
Joseph K. Bradley	ecf2eedbb8	Added decision tree, forest, GBT tests	2016-06-30 10:38:24 -07:00
Joseph K. Bradley	33a1e55366	partly done adding decision tree tests	2016-06-29 17:06:27 -07:00
jkbradley	26a685b97e	Merge pull request #72 from thunterdb/1606-glms Generalized linear models performance tests	2016-06-28 14:37:45 -07:00
Timothy Hunter	353dc0c873	comment	2016-06-28 12:00:04 -07:00
Timothy Hunter	5c1990e4ff	no normalization	2016-06-27 13:32:38 -07:00
Timothy Hunter	87dc42a466	work on glm, and some notbooks	2016-06-23 12:13:11 -07:00
Timothy Hunter	1388722b81	Initial commit for adding MLlib reporting in spark-sql-perf This PR adds basic MLlib infrastructure to run some benchmarks against ML pipelines. There are 2 ways to describe and run ML pipelines: - programatically, in scala (see MLBenchmarks.scala) - using a simple YAML file (see mllib-small.yaml for an example) The YAML approach is preferred because it generates programmatically the cartesian product of all the experiments to run and validates the types of the objects in the yaml file. In both cases, all the ML experiments are standard benchmarks. This PR also moves some code in `Benchmark.scala` : the current code generates path-dependent structural signatures and confuses intellij. It does not include tests, but some small benchmarks can be run locally against a spark 2 installation: ``` $SPARK_HOME/bin/spark-shell --jars $PWD/target/scala-2.10/spark-sql-perf-assembly-0.4.9-SNAPSHOT.jar ``` and then: ```scala com.databricks.spark.sql.perf.mllib.MLLib.run(yamlFile="src/main/scala/configs/mllib-small.yaml") ``` Author: Timothy Hunter <timhunter@databricks.com> Closes #69 from thunterdb/1605-mllib2.	2016-06-22 16:59:49 -07:00
Davies Liu	ea342c6165	fix checking results and bump to 0.4.9	2016-06-17 12:53:12 -07:00
Eric Liang	0d1e9043f1	[SC-3547] Fix various typos in queries and bump version to 0.48	2016-06-14 12:27:24 -07:00
Davies Liu	cc50104194	bump to 0.4.7	2016-05-24 10:41:21 -07:00
Davies Liu	c087b68a5c	make number of partitions configurable	2016-05-24 10:40:51 -07:00
Sameer Agarwal	375e116b1a	bump to 0.4.6	2016-05-23 14:08:02 -07:00
Sameer Agarwal	1840fd9f21	Fix/rewrite some TPC-DS 1.4 queries This patch ports upstream query modifications from apache/spark#13188	2016-05-23 14:02:47 -07:00
Sameer Agarwal	0355fc4ee7	Fix build and switch to jdk8 * Fix Build * more memory * switch to jdk8 * old memory settings	2016-05-23 12:54:07 -07:00
Sameer Agarwal	10b90c0d2b	Fix q8 in ImpalaKit	2016-04-29 14:07:31 -07:00
Davies Liu	b8a90621cf	bump to 0.4.5	2016-03-30 11:57:35 -07:00

1 2 3 4

168 Commits