spark-sql-perf

github/spark-sql-perf

Fork 0

304fdaf81a 增加单表参数 master Your Name 2026-01-07 12:47:26 +0800
28d88190f6

Update Spark repository for sbt (#206) Frank Luan 2021-12-13 02:20:13 -0800
ca4ccea3dd

Add a convenient class to generate TPC-DS data (#196) Yuming Wang 2021-03-30 20:19:36 +0800
65785a8a04

Fix Travis CI JDK installation (#195) Yuming Wang 2021-01-29 00:28:46 +0800
d85f75bb38

Update for Spark 3.0.0 compatibility (#191) Nico Poggi 2020-11-03 15:27:34 +0100
6b2bf9f9ad Fix files truncating according to maxRecordPerFile (#180) Guo Chenzhao 2019-05-29 23:20:01 +0800
3f92a094cc

Bumping version to 0.5.1-SNAPSHOT (spark 3, scala 2.12, log4j ) (#168) Nico Poggi 2019-01-29 10:00:54 +0100
e1e1365a87 Updates for Spark 3.0 and Scala 2.12 compatibility (#176) Luca Canali 2019-01-29 09:58:52 +0100
85bbfd4ca2 [ML-5437] Build with spark-2.4.0 and resolve build issues (#174) Bago Amirbekian 2018-11-09 16:21:22 -0800
d44caec277

Revert "Update Scala Logging to officially supported one " (#172) Nico Poggi 2018-10-19 17:33:34 +0200
0367ff65a6

Coalesce(n) instead of hardcoded (1) for large tables/partitions Nico Poggi 2018-10-16 11:21:05 +0200
3c1c9e9070

Rebase for PR 87: Add -m for custom master, use SBT_HOME if set (#169) Nico Poggi 2018-09-17 15:18:16 +0200
d9a41a1204 Fix 3 local benchmark classes (#165) Phil 2018-09-17 13:08:56 +0100
aac7eb54c1

Fixing TPCH DDL datatype of customer.c_nationkey from string to long (#167) Nico Poggi 2018-09-13 12:00:29 +0200
56f73482d7 Update Scala Logging to officially supported one Piotr Mrówczyński 2018-09-11 12:17:06 +0200
6136ecea6e

TPC-H datagenerator and instructions (#136) Nico Poggi 2018-09-10 23:18:33 +0200
8bbeae664d

Adds an optional version of the SS_MAX query (#137) Nico Poggi 2018-09-10 22:54:02 +0200
bf55bdb987

Make queryNames public so it can be accessed from notebooks. (#166) Nico Poggi 2018-09-10 22:53:20 +0200
bb12958874

Fix compile for Spark 2.4 SNAPSHOT and only catch NonFatal (#164) Xiangrui Meng 2018-09-10 08:49:31 -0700
0ab6bf606b Benchmark for SparkR UDF *apply() APIs Liang Zhang 2018-07-12 17:12:35 -0700
8e8c08d75b [ML-4154] Added testing for before/after of ml benchmarks. (#162) Bago Amirbekian 2018-07-12 16:43:54 -0700
107495afe2

[ML-4069] Improve timing of estimators (#161) Joseph Bradley 2018-07-09 17:41:44 -0700
30c50dddbb [ML-2918] Call count() in default score() to improve timing of transform() (#159) Joseph Bradley 2018-07-08 16:09:24 -0700
1798b12077

change large test timeout to 12 hours (#160) Xiangrui Meng 2018-07-04 15:32:00 -0700
2895ae1139 update VectorAssembler test such that the dataset size is numExamples * numFeatures (#158) Xiangrui Meng 2018-07-03 17:16:36 -0700
e9ef9788c2 [ML-3844] Add GBTRegression benchmark (#156) ludatabricks 2018-06-27 09:17:38 -0700
e8aa132bb8 [ML-3870] Make spark-sql-perf master compiled with spark 2.3 and scala 2.11 (#155) ludatabricks 2018-06-15 06:40:14 -0700
49717a72dd put additionalTests to mlmetrics (#153) ludatabricks 2018-06-13 15:21:50 -0700
a4e1c790ba [ML-3869] Make Quantilediscretizer work with spark-2.3 (#154) ludatabricks 2018-06-13 15:19:52 -0700
51786921a6 [ML-3583] Add benchmarks to mllib-large.yaml for featurization (#152) ludatabricks 2018-06-12 17:31:30 -0700
aa1587fec5 [ML-3824] Add benchmarks to mllib-large.yaml for FPGrowth (#151) ludatabricks 2018-06-12 13:10:12 -0700
6a45dc8a2d [ML-3581] Add benchmarks to mllib-large.yaml for regression (#150) ludatabricks 2018-06-12 10:32:02 -0700
9ab2a8bb14 [ML-3585] Added benchmarks to mllib-large.yaml for clustering (#149) ludatabricks 2018-06-08 12:06:52 -0700
62b173d779 Output Training Time as metrics (#148) ludatabricks 2018-06-07 13:21:32 -0700
d9984e1c0a [ML-3584] Added benchmarks to mllib-large.yaml for ALS (#147) ludatabricks 2018-06-07 08:11:37 -0700
93626c11b4 [ML-3775] Add "benchmarkId" to BenchmarkResult (#146) ludatabricks 2018-06-04 14:13:45 -0700
f1139fc742 [ML-3753] Log "value" instead of "Some(value)" for ML params in results (#145) ludatabricks 2018-06-04 11:09:41 -0700
1768d376f9 [ML-3749] Log metric name and isLargerBetter in BenchmarkResult (#144) ludatabricks 2018-06-01 15:49:16 -0700
789a0f5b8b Added benchmarks to mllib-large.yaml for classifcation Estimators. (#143) Bago Amirbekian 2018-05-30 08:18:49 -0700
3786a8391e Quantile discretizer benchmark (#135) WeichenXu 2018-05-18 02:55:00 +0800
15d9283473 Run mllib small in unit tests (#141) Bago Amirbekian 2018-05-09 16:24:30 -0700
9ece11ff20 Add decision tree benchmark (#140) Bago Amirbekian 2018-05-08 21:44:11 -0700
ed9bbb01a5 fix bug with ML additional method tests (#142) Joseph Bradley 2018-05-08 13:23:22 -0700
be4459fe41 Additional method test for some ML algos (#139) WeichenXu 2018-05-03 04:45:58 +0800
5af9f6dfc2 Word2Vec benchmark (#127) WeichenXu 2018-03-16 04:10:04 +0800
a8acd53fdd

Use DECIMAL and DATE in the default TPCDS notebooks. (#130) Juliusz Sompolski 2018-03-07 21:44:42 +0100
b7ac7e55ae

Remove VACUUM from tpcds_datagen notebook. (#129) Juliusz Sompolski 2018-03-07 15:36:27 +0100
93a34553f0 MinHashLSH and BucketedRandomProjectionLSH benchmark #128 WeichenXu 2018-03-03 07:21:37 +0800
6d01ac94a1 [ML-3342] Bug fixes to make mllib benchmarks work with dbr-4.0. (#125) Bago Amirbekian 2018-03-02 09:12:38 -0800
91604a3ab0 Update README to specify that TPCDS kit needs to be installed on all nodes. Juliusz Sompolski 2018-02-27 12:06:12 +0100
31f34beee5

Update README to do sql("use database") (#123) Juliusz Sompolski 2017-11-07 20:38:26 +0100
7bf2d45b0f Don't clean blocks after every run in Benchmarkable (#119) Juliusz Sompolski 2017-09-18 11:51:12 +0200
fdd0e38717 TPCDS notebooks in source, not binary format (#121) Juliusz Sompolski 2017-09-13 14:57:59 +0200
006f096562 Merge pull request #120 from juliuszsompolski/tpcds_notebooks Nico Poggi 2017-09-12 17:22:38 +0200
5ebb9cfb12 add some more comments Juliusz Sompolski 2017-09-12 16:51:26 +0200
c78f2b3a9b update readme Juliusz Sompolski 2017-09-12 16:40:23 +0200
ae8bcdb292 add notebooks Juliusz Sompolski 2017-09-12 15:43:08 +0200
f08bf31d18 add benchmark for FPGrowth (#113) WeichenXu 2017-09-05 01:48:05 +0800
bcda8fc1e5 Coalesce non-partitioned tables. (#118) Juliusz Sompolski 2017-09-04 18:05:42 +0200
3e1bbd00ed [ML-2847] Add new tests for (DecisionTree, RandomForest)Regression, GMM, HashingTF (#116) Siddharth Murching 2017-09-03 22:26:20 -0700
19c41464c7 fix df.drop in VectorAssembler (#117) WeichenXu 2017-09-02 04:51:05 +0800
6ec83fd0f7 Add benchmark for LinearSVC/OnehotEncoder/VectorSlicer/VectorAssembler/StringIndexer/Tokenizer (#112) WeichenXu 2017-09-01 04:56:43 +0800
737a1bc355 BlockingLineStream (#115) Juliusz Sompolski 2017-08-31 15:16:22 +0200
9febc34f66 Refactor MLParams for spark-sql-perf (#114) Siddharth Murching 2017-08-28 13:23:59 -0700
d0de5ae8aa Update tests to run with Spark 2.2, add NaiveBayes & Bucketizer ML tests (#110) Siddharth Murching 2017-08-21 15:07:46 -0700
b3a6ed79b3 Start the development 0.5.0-SNAPSHOT Yin Huai 2017-08-21 14:21:19 -0700
4e7a2363b9 Support for TPC-H benchmark Bogdan Raducanu 2017-08-09 12:26:32 +0200
fdcde7595c Update README (#107) Kevin 2017-07-13 10:45:24 +0200
6488d74d23 tpcds_2_4: Add alias names to subqueries in FROM clause. Juliusz Sompolski 2017-06-29 02:59:34 +0200
bff6b34f62 Tweaks and improvements (#106) Juliusz Sompolski 2017-06-13 11:42:14 +0200
75f3876e59 Merge pull request #103 from juliuszsompolski/fixtypes Juliusz Sompolski 2017-05-26 11:53:19 +0200
2ddd521ab5 ok, make it long only where really needed. Juliusz Sompolski 2017-05-26 10:36:40 +0200
1bca964a3d Correct types of keys Juliusz Sompolski 2017-05-25 17:12:47 +0200
beec62844d Merge pull request #101 from vlyubin/master Volodymyr Lyubinets 2017-05-16 10:35:35 +0200
c0bd21c2ec Add ss_max vlyubin 2017-05-16 10:29:00 +0200
e5dc6f338f Updated queries 23 vlyubin 2017-05-15 17:30:20 +0200
e8f85b0b0e Moved queries into a separate folder vlyubin 2017-05-15 14:22:37 +0200
96bf10bffc Add tpcds 2.4 queries vlyubin 2017-05-12 11:54:32 +0200
c12b14b013 Merge pull request #98 from databricks/parallel-runs Eric Liang 2017-03-15 13:50:41 -0700
64728c7cff Add option to avoid cleaning after each run, to enable parallel runs Eric Liang 2017-03-14 19:45:27 -0700
53091a1935 Removes labels from tree data generation (#82) Timothy Hunter 2016-12-13 16:47:31 -0800
685c50d9dc Cross build with Scala 2.11 (#91) srinathshankar 2016-10-03 17:01:17 -0700
0eaa4b1d57 [SC-4409] Correct query 41 in TPCDS kit (#90) srinathshankar 2016-09-30 18:02:39 -0700
c2224f37e5 Depend on non-snapshot Spark now that 2.0.0 is released Josh Rosen 2016-08-17 17:53:30 -0700
948c8369e7 Fixes issues with scala 2.11 Timothy Hunter 2016-07-19 11:19:52 -0700
8830bffd46 Merge pull request #79 from jkbradley/tree-test-fix Timothy Hunter 2016-07-11 10:42:19 -0700
51469a34d6 Fixed tree, forest, GBT tests by adding metadata to DataFrames Joseph K. Bradley 2016-07-11 10:33:19 -0700
1fcc366cec Merge pull request #78 from thunterdb/1607-fixes Timothy Hunter 2016-07-06 11:34:05 -0700
c7d42d3626 adding parameters Timothy Hunter 2016-07-06 11:23:07 -0700
2672bcd5b7 ALS algorithm for spark-sql-perf Timothy Hunter 2016-07-05 15:54:08 -0700
93c0407bbe Merge pull request #77 from thunterdb/1607-linear Timothy Hunter 2016-07-05 15:41:35 -0700
40e97ca3c0 comment Timothy Hunter 2016-07-05 15:01:50 -0700
ce7e20ae6d set the solver Timothy Hunter 2016-07-05 13:46:19 -0700
def20479a1 linear regression Timothy Hunter 2016-07-05 13:42:56 -0700
979ebd5d0f Merge pull request #75 from jkbradley/kmeans Timothy Hunter 2016-07-05 10:14:11 -0700
9d11a601c3 added kmeans test Joseph K. Bradley 2016-07-01 18:00:49 -0700
3d3443791c Merge pull request #74 from jkbradley/dt-tests jkbradley 2016-07-01 17:40:16 -0700
495e2716c4 updated per code review. works in local tests Joseph K. Bradley 2016-07-01 17:39:28 -0700
c2f0a35db4 Merge pull request #1 from thunterdb/1606-trees jkbradley 2016-07-01 11:46:41 -0700
813bd8ad59 adding more experiments Timothy Hunter 2016-07-01 10:34:42 -0700

Commit Graph Select branches Hide Pull Requests master Mono Color

Commit Graph

Select branches

Hide Pull Requests

master