Yin Huai
3af656defa
Make ExecutionMode.HashResults handle null value
...
In Spark 1.6, if a value is null, `getLong` will throw an exception. Before 1.6, it will return 0. With this PR, we will check if the result is null. If it is null, null will be returned instead of 0.
Author: Yin Huai <yhuai@databricks.com>
Closes #41 from yhuai/fixSumHash.
2015-12-08 15:28:48 -08:00
Nong Li
43c2f23bb9
Fixes for Q34 and Q73 to return results deterministically.
...
Author: Nong Li <nong@databricks.com>
Closes #38 from nongli/tpcds.
2015-11-25 15:03:33 -08:00
Nong
70e0dbe656
Add official TPCDS 1.4 queries.
...
Author: Nong <nong@cloudera.com>
Closes #36 from nongli/tpcds.
2015-11-24 13:12:46 -08:00
Nong Li
1aa5bfc838
Add remaining tpcds tables.
...
Author: Nong Li <nongli@gmail.com>
Closes #34 from nongli/tpcds.
2015-11-19 13:50:00 -08:00
Nong Li
8d9e8ce9a3
Add another fact table and updates to load a single table at a time.
...
Author: Nong Li <nongli@gmail.com>
Closes #31 from nongli/more_tables.
2015-11-18 11:12:01 -08:00
Andrew Or
426ae30a2e
Increase integration surface area with Spark perf
...
The changes in this PR are centered around making `Benchmark#runExperiment` accept things other than `Query`s. In particular, in spark-perf we don't always have a DataFrame or an RDD to work with and may want to run arbitrary code (e.g. ALS.train). This PR makes it possible to use the same code in `Benchmark` to do this.
I tested this on dogfood and it works well there.
Author: Andrew Or <andrew@databricks.com>
Closes #33 from andrewor14/spark-perf.
2015-11-18 10:50:46 -08:00
Andrew Or
172ae79f8d
Introduce small integration point with Spark perf
...
This allows us to report Spark perf results in the same format as SQL benchmark results. marmbrus
Author: Andrew Or <andrew@databricks.com>
Closes #30 from andrewor14/spark-perf.
2015-11-16 17:46:53 -08:00
Michael Armbrust
344b31ed69
Update to Spark 1.6
...
Some internal interfaces changed, so we need to bump the Spark version to run tests on Spark 1.6.
Author: Michael Armbrust <michael@databricks.com>
Closes #29 from marmbrus/spark16.
2015-11-13 12:40:00 -08:00
Nong Li
dc48f2e49b
Support generating the data as "text".
...
This previously failed since text only supports a single column. Having the option of
text output is useful to quickly see what the generator is doing.
Author: Nong Li <nongli@gmail.com>
Closes #27 from nongli/text.
2015-11-11 12:05:14 -08:00
bit1129
f63d40ce9f
Add 2 queries
...
Author: bit1129 <bit1129@gmail.com>
Closes #22 from bit1129/master.
2015-09-16 10:10:20 -07:00
Michael Armbrust
40d085f1c7
Add dashboard notebook
...
Author: Michael Armbrust <michael@databricks.com>
Closes #21 from marmbrus/master.
2015-09-11 17:46:07 -07:00
Michael Armbrust
f03b3af719
Fail gracefully when invalid CPU logs are encountered
...
Author: Michael Armbrust <michael@databricks.com>
Closes #18 from marmbrus/parseCpuFail.
2015-09-09 22:02:23 -07:00
Michael Armbrust
e2dc749480
Add more tests for join performance
...
Author: Michael Armbrust <michael@databricks.com>
Closes #17 from marmbrus/joinPerf.
2015-09-09 21:56:47 -07:00
Michael Armbrust
08cb68ca20
Make it easier to write benchmarks in notebooks
...
Author: Michael Armbrust <michael@databricks.com>
Closes #19 from marmbrus/notebookTests.
2015-09-09 21:49:50 -07:00
Yin Huai
34f66a0a10
Add a option of filter rows with null partition column values.
2015-08-26 11:14:19 -07:00
Yin Huai
f4e20af107
fix typo
2015-08-25 23:31:50 -07:00
Yin Huai
06eb11f326
Fix the seed to 100 and use distribute by instead of order by.
2015-08-25 20:44:14 -07:00
Yin Huai
9936d49239
Add a option to orderBy partition columns.
2015-08-25 20:44:14 -07:00
Yin Huai
58188c6711
Allow users to use double instead of decimal for generated tables.
2015-08-25 20:44:14 -07:00
Yin Huai
77fbe22b7b
address comments.
2015-08-25 20:44:13 -07:00
Yin Huai
97093a45cd
Update readme and register temp tables.
2015-08-25 20:44:13 -07:00
Yin Huai
edb4daba80
Bug fix.
2015-08-25 20:44:13 -07:00
Yin Huai
544adce70f
Add methods to genData.
2015-08-25 20:44:13 -07:00
Michael Armbrust
32215e05ee
Block completion of cpu collection
2015-08-24 16:13:26 -07:00
Michael Armbrust
00aa49e8e4
Add support for CPU Profiling.
2015-08-20 16:46:12 -07:00
Yin Huai
249157f6a6
Fix typo.
2015-08-17 12:56:35 -07:00
Yin Huai
d5c3104ec6
address comments.
2015-08-14 11:39:06 -07:00
Yin Huai
51546868f4
You can specific perf result location.
2015-08-13 18:43:50 -07:00
Yin Huai
11bfdc7c5a
Add an ExecutionMode to check query results.
2015-08-13 18:43:49 -07:00
Michael Armbrust
ed8ddfedcd
yins comments
2015-08-13 17:54:00 -07:00
Michael Armbrust
4101a1e968
Fixes to breakdown calculation and table creation.
2015-08-13 15:47:01 -07:00
Michael Armbrust
a239da90a2
more cleanup, update readme
2015-08-11 15:51:34 -07:00
Michael Armbrust
51b9dcb5b5
Merge remote-tracking branch 'origin/master' into refactor
...
Conflicts:
src/main/scala/com/databricks/spark/sql/perf/bigdata/Queries.scala
src/main/scala/com/databricks/spark/sql/perf/query.scala
src/main/scala/com/databricks/spark/sql/perf/runBenchmarks.scala
src/main/scala/com/databricks/spark/sql/perf/table.scala
src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/ImpalaKitQueries.scala
src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/SimpleQueries.scala
2015-08-07 15:31:32 -07:00
Jean-Yves Stephan
9421522820
Closing bracket
2015-07-22 15:03:43 -07:00
Yin Huai
a50fedd5bc
Merge pull request #2 from jystephan/master
...
Allow saving benchmark queries results as parquet files
2015-07-22 13:40:39 -07:00
Jean-Yves Stephan
653d82134d
No collect before saveAsParquet
2015-07-22 13:30:40 -07:00
Michael Armbrust
f00ad77985
with data generation
2015-07-22 00:29:58 -07:00
Jean-Yves Stephan
a4a53b8a73
Took Aaron's comments
2015-07-21 20:05:53 -07:00
Jean-Yves Stephan
d866cce1a1
Format
2015-07-21 13:27:50 -07:00
Jean-Yves Stephan
933f3f0bb5
Removed queryOutputLocation parameter
2015-07-21 13:26:50 -07:00
Jean-Yves Stephan
9640cd8c1e
The execution mode (collect results / foreach results / writeparquet) is now specified as an argument to Query.
2015-07-21 13:23:11 -07:00
Jean-Yves Stephan
8e62e4fdbd
Added optional parameters to runBenchmark to specify a location to save queries outputs as parquet files.
...
+ Removed the hardcoded baseDir/parquet/ structure
2015-07-20 17:09:20 -07:00
Michael Armbrust
eba8cea93c
Basic join performance tests
2015-07-13 16:20:36 -07:00
Michael Armbrust
eb3dd30c35
Refactor to work in notebooks
2015-07-03 11:26:06 -07:00
Pace Francesco
4f4b08a122
Reading hadoopConfiguration from Spark.
...
Read hadoopConfiguration from SparkContext instead of creating a new Configuration directly from Hadoop config files.
This allow us to use hadoop parameters inserted or modified in one of Spark config files. (e.g.: Swift credentials).
2015-06-19 15:01:57 +02:00
Yin Huai
3eca8d2947
Add a method to wait for the finish of the experiment (waitForFinish).
2015-05-22 12:41:55 -07:00
Yin Huai
70da4f490e
Move dataframe into benchmark.
2015-05-16 19:31:55 -07:00
Yin Huai
9156e14f4b
Provide userSpecifiedBaseDir to access a dataset that is not in the path with the default format.
2015-05-07 11:01:38 -07:00
Yin Huai
6c5657b609
Refactoring and doc.
2015-04-16 18:10:57 -07:00
Yin Huai
930751810e
Initial port.
2015-04-15 20:03:14 -07:00