spark-sql-perf

Author	SHA1	Message	Date
Nong Li	1aa5bfc838	Add remaining tpcds tables. Author: Nong Li <nongli@gmail.com> Closes #34 from nongli/tpcds.	2015-11-19 13:50:00 -08:00
Nong Li	8d9e8ce9a3	Add another fact table and updates to load a single table at a time. Author: Nong Li <nongli@gmail.com> Closes #31 from nongli/more_tables.	2015-11-18 11:12:01 -08:00
Andrew Or	426ae30a2e	Increase integration surface area with Spark perf The changes in this PR are centered around making `Benchmark#runExperiment` accept things other than `Query`s. In particular, in spark-perf we don't always have a DataFrame or an RDD to work with and may want to run arbitrary code (e.g. ALS.train). This PR makes it possible to use the same code in `Benchmark` to do this. I tested this on dogfood and it works well there. Author: Andrew Or <andrew@databricks.com> Closes #33 from andrewor14/spark-perf.	2015-11-18 10:50:46 -08:00
Andrew Or	172ae79f8d	Introduce small integration point with Spark perf This allows us to report Spark perf results in the same format as SQL benchmark results. marmbrus Author: Andrew Or <andrew@databricks.com> Closes #30 from andrewor14/spark-perf.	2015-11-16 17:46:53 -08:00
Michael Armbrust	344b31ed69	Update to Spark 1.6 Some internal interfaces changed, so we need to bump the Spark version to run tests on Spark 1.6. Author: Michael Armbrust <michael@databricks.com> Closes #29 from marmbrus/spark16.	2015-11-13 12:40:00 -08:00
Nong Li	dc48f2e49b	Support generating the data as "text". This previously failed since text only supports a single column. Having the option of text output is useful to quickly see what the generator is doing. Author: Nong Li <nongli@gmail.com> Closes #27 from nongli/text.	2015-11-11 12:05:14 -08:00
bit1129	f63d40ce9f	Add 2 queries Author: bit1129 <bit1129@gmail.com> Closes #22 from bit1129/master.	2015-09-16 10:10:20 -07:00
Michael Armbrust	40d085f1c7	Add dashboard notebook Author: Michael Armbrust <michael@databricks.com> Closes #21 from marmbrus/master.	2015-09-11 17:46:07 -07:00
Michael Armbrust	f03b3af719	Fail gracefully when invalid CPU logs are encountered Author: Michael Armbrust <michael@databricks.com> Closes #18 from marmbrus/parseCpuFail.	2015-09-09 22:02:23 -07:00
Michael Armbrust	e2dc749480	Add more tests for join performance Author: Michael Armbrust <michael@databricks.com> Closes #17 from marmbrus/joinPerf.	2015-09-09 21:56:47 -07:00
Michael Armbrust	08cb68ca20	Make it easier to write benchmarks in notebooks Author: Michael Armbrust <michael@databricks.com> Closes #19 from marmbrus/notebookTests.	2015-09-09 21:49:50 -07:00
Yin Huai	34f66a0a10	Add a option of filter rows with null partition column values.	2015-08-26 11:14:19 -07:00
Yin Huai	f4e20af107	fix typo	2015-08-25 23:31:50 -07:00
Yin Huai	06eb11f326	Fix the seed to 100 and use distribute by instead of order by.	2015-08-25 20:44:14 -07:00
Yin Huai	9936d49239	Add a option to orderBy partition columns.	2015-08-25 20:44:14 -07:00
Yin Huai	58188c6711	Allow users to use double instead of decimal for generated tables.	2015-08-25 20:44:14 -07:00
Yin Huai	77fbe22b7b	address comments.	2015-08-25 20:44:13 -07:00
Yin Huai	97093a45cd	Update readme and register temp tables.	2015-08-25 20:44:13 -07:00
Yin Huai	edb4daba80	Bug fix.	2015-08-25 20:44:13 -07:00
Yin Huai	544adce70f	Add methods to genData.	2015-08-25 20:44:13 -07:00
Michael Armbrust	32215e05ee	Block completion of cpu collection	2015-08-24 16:13:26 -07:00
Michael Armbrust	00aa49e8e4	Add support for CPU Profiling.	2015-08-20 16:46:12 -07:00
Yin Huai	249157f6a6	Fix typo.	2015-08-17 12:56:35 -07:00
Yin Huai	d5c3104ec6	address comments.	2015-08-14 11:39:06 -07:00
Yin Huai	51546868f4	You can specific perf result location.	2015-08-13 18:43:50 -07:00
Yin Huai	11bfdc7c5a	Add an ExecutionMode to check query results.	2015-08-13 18:43:49 -07:00
Michael Armbrust	ed8ddfedcd	yins comments	2015-08-13 17:54:00 -07:00
Michael Armbrust	4101a1e968	Fixes to breakdown calculation and table creation.	2015-08-13 15:47:01 -07:00
Michael Armbrust	a239da90a2	more cleanup, update readme	2015-08-11 15:51:34 -07:00
Michael Armbrust	51b9dcb5b5	Merge remote-tracking branch 'origin/master' into refactor Conflicts: src/main/scala/com/databricks/spark/sql/perf/bigdata/Queries.scala src/main/scala/com/databricks/spark/sql/perf/query.scala src/main/scala/com/databricks/spark/sql/perf/runBenchmarks.scala src/main/scala/com/databricks/spark/sql/perf/table.scala src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/ImpalaKitQueries.scala src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/SimpleQueries.scala	2015-08-07 15:31:32 -07:00
Jean-Yves Stephan	9421522820	Closing bracket	2015-07-22 15:03:43 -07:00
Yin Huai	a50fedd5bc	Merge pull request #2 from jystephan/master Allow saving benchmark queries results as parquet files	2015-07-22 13:40:39 -07:00
Jean-Yves Stephan	653d82134d	No collect before saveAsParquet	2015-07-22 13:30:40 -07:00
Michael Armbrust	f00ad77985	with data generation	2015-07-22 00:29:58 -07:00
Jean-Yves Stephan	a4a53b8a73	Took Aaron's comments	2015-07-21 20:05:53 -07:00
Jean-Yves Stephan	d866cce1a1	Format	2015-07-21 13:27:50 -07:00
Jean-Yves Stephan	933f3f0bb5	Removed queryOutputLocation parameter	2015-07-21 13:26:50 -07:00
Jean-Yves Stephan	9640cd8c1e	The execution mode (collect results / foreach results / writeparquet) is now specified as an argument to Query.	2015-07-21 13:23:11 -07:00
Jean-Yves Stephan	8e62e4fdbd	Added optional parameters to runBenchmark to specify a location to save queries outputs as parquet files. + Removed the hardcoded baseDir/parquet/ structure	2015-07-20 17:09:20 -07:00
Michael Armbrust	eba8cea93c	Basic join performance tests	2015-07-13 16:20:36 -07:00
Michael Armbrust	eb3dd30c35	Refactor to work in notebooks	2015-07-03 11:26:06 -07:00
Pace Francesco	4f4b08a122	Reading hadoopConfiguration from Spark. Read hadoopConfiguration from SparkContext instead of creating a new Configuration directly from Hadoop config files. This allow us to use hadoop parameters inserted or modified in one of Spark config files. (e.g.: Swift credentials).	2015-06-19 15:01:57 +02:00
Yin Huai	3eca8d2947	Add a method to wait for the finish of the experiment (waitForFinish).	2015-05-22 12:41:55 -07:00
Yin Huai	70da4f490e	Move dataframe into benchmark.	2015-05-16 19:31:55 -07:00
Yin Huai	9156e14f4b	Provide userSpecifiedBaseDir to access a dataset that is not in the path with the default format.	2015-05-07 11:01:38 -07:00
Yin Huai	6c5657b609	Refactoring and doc.	2015-04-16 18:10:57 -07:00
Yin Huai	930751810e	Initial port.	2015-04-15 20:03:14 -07:00

47 Commits