Nong Li
1aa5bfc838
Add remaining tpcds tables.
...
Author: Nong Li <nongli@gmail.com>
Closes #34 from nongli/tpcds.
2015-11-19 13:50:00 -08:00
Nong Li
8d9e8ce9a3
Add another fact table and updates to load a single table at a time.
...
Author: Nong Li <nongli@gmail.com>
Closes #31 from nongli/more_tables.
2015-11-18 11:12:01 -08:00
Andrew Or
426ae30a2e
Increase integration surface area with Spark perf
...
The changes in this PR are centered around making `Benchmark#runExperiment` accept things other than `Query`s. In particular, in spark-perf we don't always have a DataFrame or an RDD to work with and may want to run arbitrary code (e.g. ALS.train). This PR makes it possible to use the same code in `Benchmark` to do this.
I tested this on dogfood and it works well there.
Author: Andrew Or <andrew@databricks.com>
Closes #33 from andrewor14/spark-perf.
2015-11-18 10:50:46 -08:00
Andrew Or
172ae79f8d
Introduce small integration point with Spark perf
...
This allows us to report Spark perf results in the same format as SQL benchmark results. marmbrus
Author: Andrew Or <andrew@databricks.com>
Closes #30 from andrewor14/spark-perf.
2015-11-16 17:46:53 -08:00
Michael Armbrust
344b31ed69
Update to Spark 1.6
...
Some internal interfaces changed, so we need to bump the Spark version to run tests on Spark 1.6.
Author: Michael Armbrust <michael@databricks.com>
Closes #29 from marmbrus/spark16.
2015-11-13 12:40:00 -08:00
Nong Li
dc48f2e49b
Support generating the data as "text".
...
This previously failed since text only supports a single column. Having the option of
text output is useful to quickly see what the generator is doing.
Author: Nong Li <nongli@gmail.com>
Closes #27 from nongli/text.
2015-11-11 12:05:14 -08:00
bit1129
f63d40ce9f
Add 2 queries
...
Author: bit1129 <bit1129@gmail.com>
Closes #22 from bit1129/master.
2015-09-16 10:10:20 -07:00
Michael Armbrust
40d085f1c7
Add dashboard notebook
...
Author: Michael Armbrust <michael@databricks.com>
Closes #21 from marmbrus/master.
2015-09-11 17:46:07 -07:00
Michael Armbrust
f03b3af719
Fail gracefully when invalid CPU logs are encountered
...
Author: Michael Armbrust <michael@databricks.com>
Closes #18 from marmbrus/parseCpuFail.
2015-09-09 22:02:23 -07:00
Michael Armbrust
e2dc749480
Add more tests for join performance
...
Author: Michael Armbrust <michael@databricks.com>
Closes #17 from marmbrus/joinPerf.
2015-09-09 21:56:47 -07:00
Michael Armbrust
08cb68ca20
Make it easier to write benchmarks in notebooks
...
Author: Michael Armbrust <michael@databricks.com>
Closes #19 from marmbrus/notebookTests.
2015-09-09 21:49:50 -07:00
Yin Huai
34f66a0a10
Add a option of filter rows with null partition column values.
2015-08-26 11:14:19 -07:00
Yin Huai
f4e20af107
fix typo
2015-08-25 23:31:50 -07:00
Yin Huai
06eb11f326
Fix the seed to 100 and use distribute by instead of order by.
2015-08-25 20:44:14 -07:00
Yin Huai
9936d49239
Add a option to orderBy partition columns.
2015-08-25 20:44:14 -07:00
Yin Huai
58188c6711
Allow users to use double instead of decimal for generated tables.
2015-08-25 20:44:14 -07:00
Yin Huai
77fbe22b7b
address comments.
2015-08-25 20:44:13 -07:00
Yin Huai
97093a45cd
Update readme and register temp tables.
2015-08-25 20:44:13 -07:00
Yin Huai
edb4daba80
Bug fix.
2015-08-25 20:44:13 -07:00
Yin Huai
544adce70f
Add methods to genData.
2015-08-25 20:44:13 -07:00
Michael Armbrust
32215e05ee
Block completion of cpu collection
2015-08-24 16:13:26 -07:00
Michael Armbrust
00aa49e8e4
Add support for CPU Profiling.
2015-08-20 16:46:12 -07:00
Yin Huai
249157f6a6
Fix typo.
2015-08-17 12:56:35 -07:00
Yin Huai
d5c3104ec6
address comments.
2015-08-14 11:39:06 -07:00
Yin Huai
51546868f4
You can specific perf result location.
2015-08-13 18:43:50 -07:00
Yin Huai
11bfdc7c5a
Add an ExecutionMode to check query results.
2015-08-13 18:43:49 -07:00
Michael Armbrust
ed8ddfedcd
yins comments
2015-08-13 17:54:00 -07:00
Michael Armbrust
4101a1e968
Fixes to breakdown calculation and table creation.
2015-08-13 15:47:01 -07:00
Michael Armbrust
a239da90a2
more cleanup, update readme
2015-08-11 15:51:34 -07:00
Michael Armbrust
51b9dcb5b5
Merge remote-tracking branch 'origin/master' into refactor
...
Conflicts:
src/main/scala/com/databricks/spark/sql/perf/bigdata/Queries.scala
src/main/scala/com/databricks/spark/sql/perf/query.scala
src/main/scala/com/databricks/spark/sql/perf/runBenchmarks.scala
src/main/scala/com/databricks/spark/sql/perf/table.scala
src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/ImpalaKitQueries.scala
src/main/scala/com/databricks/spark/sql/perf/tpcds/queries/SimpleQueries.scala
2015-08-07 15:31:32 -07:00
Jean-Yves Stephan
9421522820
Closing bracket
2015-07-22 15:03:43 -07:00
Yin Huai
a50fedd5bc
Merge pull request #2 from jystephan/master
...
Allow saving benchmark queries results as parquet files
2015-07-22 13:40:39 -07:00
Jean-Yves Stephan
653d82134d
No collect before saveAsParquet
2015-07-22 13:30:40 -07:00
Michael Armbrust
f00ad77985
with data generation
2015-07-22 00:29:58 -07:00
Jean-Yves Stephan
a4a53b8a73
Took Aaron's comments
2015-07-21 20:05:53 -07:00
Jean-Yves Stephan
d866cce1a1
Format
2015-07-21 13:27:50 -07:00
Jean-Yves Stephan
933f3f0bb5
Removed queryOutputLocation parameter
2015-07-21 13:26:50 -07:00
Jean-Yves Stephan
9640cd8c1e
The execution mode (collect results / foreach results / writeparquet) is now specified as an argument to Query.
2015-07-21 13:23:11 -07:00
Jean-Yves Stephan
8e62e4fdbd
Added optional parameters to runBenchmark to specify a location to save queries outputs as parquet files.
...
+ Removed the hardcoded baseDir/parquet/ structure
2015-07-20 17:09:20 -07:00
Michael Armbrust
eba8cea93c
Basic join performance tests
2015-07-13 16:20:36 -07:00
Michael Armbrust
eb3dd30c35
Refactor to work in notebooks
2015-07-03 11:26:06 -07:00
Pace Francesco
4f4b08a122
Reading hadoopConfiguration from Spark.
...
Read hadoopConfiguration from SparkContext instead of creating a new Configuration directly from Hadoop config files.
This allow us to use hadoop parameters inserted or modified in one of Spark config files. (e.g.: Swift credentials).
2015-06-19 15:01:57 +02:00
Yin Huai
3eca8d2947
Add a method to wait for the finish of the experiment (waitForFinish).
2015-05-22 12:41:55 -07:00
Yin Huai
70da4f490e
Move dataframe into benchmark.
2015-05-16 19:31:55 -07:00
Yin Huai
9156e14f4b
Provide userSpecifiedBaseDir to access a dataset that is not in the path with the default format.
2015-05-07 11:01:38 -07:00
Yin Huai
6c5657b609
Refactoring and doc.
2015-04-16 18:10:57 -07:00
Yin Huai
930751810e
Initial port.
2015-04-15 20:03:14 -07:00