[ML-3585] Added benchmarks to mllib-large.yaml for clustering (#149)

Benchmark for clustering is added to mllib-large.yaml.
GaussianMixture, KMeans, and LDA are added. BisectingKMeans is missing in spark-sql-perf now. Need to be fixed in the following up JIRA: https://databricks.atlassian.net/browse/ML-3834
Then parameters is based on the previous benchmarks for the Spark 2.2 QA.
This commit is contained in:
ludatabricks 2018-06-08 12:06:52 -07:00 committed by Xiangrui Meng
parent 62b173d779
commit 9ab2a8bb14

View File

@ -37,6 +37,28 @@ benchmarks:
numFeatures: 5000
numClasses: 2
smoothing: 1.0
- name: clustering.GaussianMixture
params:
numExamples: 100000
numTestExamples: 100000
numFeatures: 1000
k: 10
maxIter: 10
tol: 0.01
- name: clustering.KMeans
params:
k: 50
maxIter: 20
tol: 1e-3
- name: clustering.LDA
params:
docLength: 100
vocabSize: 5000
k: 60
maxIter: 20
optimizer:
- em
- online
- name: recommendation.ALS
params:
numExamples: 50000000