spark-sql-perf

Author	SHA1	Message	Date
Yin Huai	3af656defa	Make ExecutionMode.HashResults handle null value In Spark 1.6, if a value is null, `getLong` will throw an exception. Before 1.6, it will return 0. With this PR, we will check if the result is null. If it is null, null will be returned instead of 0. Author: Yin Huai <yhuai@databricks.com> Closes #41 from yhuai/fixSumHash.	2015-12-08 15:28:48 -08:00
Nong Li	43c2f23bb9	Fixes for Q34 and Q73 to return results deterministically. Author: Nong Li <nong@databricks.com> Closes #38 from nongli/tpcds.	2015-11-25 15:03:33 -08:00
Nong	70e0dbe656	Add official TPCDS 1.4 queries. Author: Nong <nong@cloudera.com> Closes #36 from nongli/tpcds.	2015-11-24 13:12:46 -08:00
Nong Li	1aa5bfc838	Add remaining tpcds tables. Author: Nong Li <nongli@gmail.com> Closes #34 from nongli/tpcds.	2015-11-19 13:50:00 -08:00
Nong Li	8d9e8ce9a3	Add another fact table and updates to load a single table at a time. Author: Nong Li <nongli@gmail.com> Closes #31 from nongli/more_tables.	2015-11-18 11:12:01 -08:00
Andrew Or	426ae30a2e	Increase integration surface area with Spark perf The changes in this PR are centered around making `Benchmark#runExperiment` accept things other than `Query`s. In particular, in spark-perf we don't always have a DataFrame or an RDD to work with and may want to run arbitrary code (e.g. ALS.train). This PR makes it possible to use the same code in `Benchmark` to do this. I tested this on dogfood and it works well there. Author: Andrew Or <andrew@databricks.com> Closes #33 from andrewor14/spark-perf.	2015-11-18 10:50:46 -08:00
Andrew Or	172ae79f8d	Introduce small integration point with Spark perf This allows us to report Spark perf results in the same format as SQL benchmark results. marmbrus Author: Andrew Or <andrew@databricks.com> Closes #30 from andrewor14/spark-perf.	2015-11-16 17:46:53 -08:00
Michael Armbrust	344b31ed69	Update to Spark 1.6 Some internal interfaces changed, so we need to bump the Spark version to run tests on Spark 1.6. Author: Michael Armbrust <michael@databricks.com> Closes #29 from marmbrus/spark16.	2015-11-13 12:40:00 -08:00
Nong Li	dc48f2e49b	Support generating the data as "text". This previously failed since text only supports a single column. Having the option of text output is useful to quickly see what the generator is doing. Author: Nong Li <nongli@gmail.com> Closes #27 from nongli/text.	2015-11-11 12:05:14 -08:00
bit1129	f63d40ce9f	Add 2 queries Author: bit1129 <bit1129@gmail.com> Closes #22 from bit1129/master.	2015-09-16 10:10:20 -07:00
Michael Armbrust	40d085f1c7	Add dashboard notebook Author: Michael Armbrust <michael@databricks.com> Closes #21 from marmbrus/master.	2015-09-11 17:46:07 -07:00
Michael Armbrust	f03b3af719	Fail gracefully when invalid CPU logs are encountered Author: Michael Armbrust <michael@databricks.com> Closes #18 from marmbrus/parseCpuFail.	2015-09-09 22:02:23 -07:00
Michael Armbrust	e2dc749480	Add more tests for join performance Author: Michael Armbrust <michael@databricks.com> Closes #17 from marmbrus/joinPerf.	2015-09-09 21:56:47 -07:00
Michael Armbrust	08cb68ca20	Make it easier to write benchmarks in notebooks Author: Michael Armbrust <michael@databricks.com> Closes #19 from marmbrus/notebookTests.	2015-09-09 21:49:50 -07:00
Yin Huai	34f66a0a10	Add a option of filter rows with null partition column values.	2015-08-26 11:14:19 -07:00
Yin Huai	f4e20af107	fix typo	2015-08-25 23:31:50 -07:00
Yin Huai	06eb11f326	Fix the seed to 100 and use distribute by instead of order by.	2015-08-25 20:44:14 -07:00
Yin Huai	9936d49239	Add a option to orderBy partition columns.	2015-08-25 20:44:14 -07:00
Yin Huai	58188c6711	Allow users to use double instead of decimal for generated tables.	2015-08-25 20:44:14 -07:00
Yin Huai	77fbe22b7b	address comments.	2015-08-25 20:44:13 -07:00
Yin Huai	97093a45cd	Update readme and register temp tables.	2015-08-25 20:44:13 -07:00
Yin Huai	edb4daba80	Bug fix.	2015-08-25 20:44:13 -07:00
Yin Huai	544adce70f	Add methods to genData.	2015-08-25 20:44:13 -07:00
Michael Armbrust	32215e05ee	Block completion of cpu collection	2015-08-24 16:13:26 -07:00
Michael Armbrust	00aa49e8e4	Add support for CPU Profiling.	2015-08-20 16:46:12 -07:00
Yin Huai	249157f6a6	Fix typo.	2015-08-17 12:56:35 -07:00
Yin Huai	d5c3104ec6	address comments.	2015-08-14 11:39:06 -07:00
Yin Huai	51546868f4	You can specific perf result location.	2015-08-13 18:43:50 -07:00
Yin Huai	11bfdc7c5a	Add an ExecutionMode to check query results.	2015-08-13 18:43:49 -07:00
Michael Armbrust	ed8ddfedcd	yins comments	2015-08-13 17:54:00 -07:00
Michael Armbrust	4101a1e968	Fixes to breakdown calculation and table creation.	2015-08-13 15:47:01 -07:00
Michael Armbrust	a239da90a2	more cleanup, update readme	2015-08-11 15:51:34 -07:00
Michael Armbrust	51b9dcb5b5	Merge remote-tracking branch 'origin/master' into refactor Conflicts: src/main/scala/com/databricks/spark/sql/perf/bigdata/Queries.scala src/main/scala/com/databricks/spark/sql/perf/query.scala src/main/scala/com/databricks/spark/sql/perf/runBenchmarks.scala src/main/scala/com/databricks/spark/sql/perf/table.scala src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/ImpalaKitQueries.scala src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/SimpleQueries.scala	2015-08-07 15:31:32 -07:00
Jean-Yves Stephan	9421522820	Closing bracket	2015-07-22 15:03:43 -07:00
Yin Huai	a50fedd5bc	Merge pull request #2 from jystephan/master Allow saving benchmark queries results as parquet files	2015-07-22 13:40:39 -07:00
Jean-Yves Stephan	653d82134d	No collect before saveAsParquet	2015-07-22 13:30:40 -07:00
Michael Armbrust	f00ad77985	with data generation	2015-07-22 00:29:58 -07:00
Jean-Yves Stephan	a4a53b8a73	Took Aaron's comments	2015-07-21 20:05:53 -07:00
Jean-Yves Stephan	d866cce1a1	Format	2015-07-21 13:27:50 -07:00
Jean-Yves Stephan	933f3f0bb5	Removed queryOutputLocation parameter	2015-07-21 13:26:50 -07:00
Jean-Yves Stephan	9640cd8c1e	The execution mode (collect results / foreach results / writeparquet) is now specified as an argument to Query.	2015-07-21 13:23:11 -07:00
Jean-Yves Stephan	8e62e4fdbd	Added optional parameters to runBenchmark to specify a location to save queries outputs as parquet files. + Removed the hardcoded baseDir/parquet/ structure	2015-07-20 17:09:20 -07:00
Michael Armbrust	eba8cea93c	Basic join performance tests	2015-07-13 16:20:36 -07:00
Michael Armbrust	eb3dd30c35	Refactor to work in notebooks	2015-07-03 11:26:06 -07:00
Pace Francesco	4f4b08a122	Reading hadoopConfiguration from Spark. Read hadoopConfiguration from SparkContext instead of creating a new Configuration directly from Hadoop config files. This allow us to use hadoop parameters inserted or modified in one of Spark config files. (e.g.: Swift credentials).	2015-06-19 15:01:57 +02:00
Yin Huai	3eca8d2947	Add a method to wait for the finish of the experiment (waitForFinish).	2015-05-22 12:41:55 -07:00
Yin Huai	70da4f490e	Move dataframe into benchmark.	2015-05-16 19:31:55 -07:00
Yin Huai	9156e14f4b	Provide userSpecifiedBaseDir to access a dataset that is not in the path with the default format.	2015-05-07 11:01:38 -07:00
Yin Huai	6c5657b609	Refactoring and doc.	2015-04-16 18:10:57 -07:00
Yin Huai	930751810e	Initial port.	2015-04-15 20:03:14 -07:00

50 Commits