[ML-3585] Added benchmarks to mllib-large.yaml for clustering (#149)
Benchmark for clustering is added to mllib-large.yaml. GaussianMixture, KMeans, and LDA are added. BisectingKMeans is missing in spark-sql-perf now. Need to be fixed in the following up JIRA: https://databricks.atlassian.net/browse/ML-3834 Then parameters is based on the previous benchmarks for the Spark 2.2 QA.
This commit is contained in:
parent
62b173d779
commit
9ab2a8bb14
@ -37,6 +37,28 @@ benchmarks:
|
||||
numFeatures: 5000
|
||||
numClasses: 2
|
||||
smoothing: 1.0
|
||||
- name: clustering.GaussianMixture
|
||||
params:
|
||||
numExamples: 100000
|
||||
numTestExamples: 100000
|
||||
numFeatures: 1000
|
||||
k: 10
|
||||
maxIter: 10
|
||||
tol: 0.01
|
||||
- name: clustering.KMeans
|
||||
params:
|
||||
k: 50
|
||||
maxIter: 20
|
||||
tol: 1e-3
|
||||
- name: clustering.LDA
|
||||
params:
|
||||
docLength: 100
|
||||
vocabSize: 5000
|
||||
k: 60
|
||||
maxIter: 20
|
||||
optimizer:
|
||||
- em
|
||||
- online
|
||||
- name: recommendation.ALS
|
||||
params:
|
||||
numExamples: 50000000
|
||||
|
||||
Loading…
Reference in New Issue
Block a user