Commit Graph

1099 Commits

Author SHA1 Message Date
jiaoqingbo
318157e3e9
[CELEBORN-305] Change the parameter passed in the registerShuffle method to numPartitions instead of numMappers (#1240) 2023-02-15 17:35:43 +08:00
jiaoqingbo
bd9e0ddc1f
[CELEBORN-304] Missing setIfMissing celeborn.$module.io.serverThreads (#1238) 2023-02-15 15:49:08 +08:00
Shuang
75c83093f2
[CELEBORN-296] fix map partition commit using wrong partitionId and result (#1233) 2023-02-14 20:54:06 +08:00
Ethan Feng
d391e7d91d
[CELEBORN-300] fix a bug about non-leader master try to update partition size. (#1235) 2023-02-14 15:44:20 +08:00
zhongqiangchen
4fb5b3d547
[CELEBORN-298] Fix the wrong configuration name in readme and conf.template (#1234) 2023-02-14 13:38:03 +08:00
Rex(Hui) An
2068e6ae37
[CELEBORN-279] Add user level push data speed metric (#1213) 2023-02-13 12:04:44 +08:00
Ethan Feng
19bcea217e
[CELEBORN-286] Fix source package contains license-binary. (#1219) 2023-02-13 11:44:14 +08:00
jiaoqingbo
3a92b0d911
[CELEBORN-284] fix typo in CelebornConf (#1218)
Co-authored-by: jiaoqb <jiaoqb@asiainfo.com>
2023-02-10 14:59:36 +08:00
Rex(Hui) An
adb6592d31
[CELEBORN-277] PushDataHandle callback could miss soft split status (#1212) 2023-02-09 14:57:18 +08:00
Angerszhuuuu
dae58a664c
[CELEBORN-239][FOLLOWUP] PUSH_DATA_TIMEOUT_MASTER/SLAVE should support convert through RPC (#1211)
* [CELEBORN-239][FOLLOWUP] PUSH_DATA_TIMEOUT_MASTER/SLAVE should support convert through  RP
2023-02-08 17:16:29 +08:00
Rex(Hui) An
f88f5fcf55
[CELEBORN-207][FOLLOW_UP] Master could miss the congestion status if enable push.data.replicate 2023-02-07 22:57:39 +08:00
Rex(Hui) An
cfe81969c9
[CELEBORN-275] WrappedCallback should only handle response from replica (#1209) 2023-02-07 18:18:13 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed (#1167) 2023-02-07 14:24:30 +08:00
Rex(Hui) An
bb113ec9be
[CELEBORN-207] Support network congestion control (#1066) 2023-02-07 12:06:18 +08:00
Angerszhuuuu
ae32c702b6
[CELEBORN-247][FOLLOWUP] Fix NPE issue (#1207) 2023-02-06 17:28:05 +08:00
Angerszhuuuu
46240d59de
[CELEBORN-247][FOLLOWUP] Add metrics for each user's quota usage (#1206) 2023-02-06 14:02:57 +08:00
Angerszhuuuu
c4020100db
[CELEBORN-271][BUG] PushState in PushDataHandler should should use peer's location 2023-02-06 11:31:57 +08:00
Angerszhuuuu
ecc3a0e52f
[CELEBORN-272][BUG] Don't do replication should directly use callback not wrappedCallback (#1205) 2023-02-06 11:28:12 +08:00
zhongqiangchen
8e903840af [CELEBORN-243][REWORK]fix bug that os's disk usage is low but celeborn thinks that it's high_disk_usage (#1202) 2023-02-04 14:27:44 +08:00
Angerszhuuuu
2e68912812
[CELEBORN-269][BUG] Disable replication throw NPE when removeBatch in pushDataHandler (#1203) 2023-02-03 20:06:59 +08:00
Angerszhuuuu
ff683ffc91
[CELEBORN-238][IMPROVEMENT] Revive caused by PUSH_DATA_TIMEOUT_MASTER and PUSH_DATA_TIMEOUT_SLAVE should add corresponding worker into blacklist (#1180) 2023-02-03 17:47:24 +08:00
Shuang
2634476758
[CELEBORN-267] reuse stream when client channel reconnected (#1200) 2023-02-03 15:12:45 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too (#1185) 2023-02-03 11:53:15 +08:00
Angerszhuuuu
04427f2b16
[CELEBORN-247] Add metrics for each user's quota usage (#1182) 2023-02-02 18:31:08 +08:00
Angerszhuuuu
ced08a1d89
[CELEBORN-266] Fix wrong old version configurations (#1198) 2023-02-02 14:03:45 +08:00
Angerszhuuuu
c410392284
[CELEBORN-265] Integration with Spark3.0 cast class exception of ShuffleHandler (#1197)
* [CELEBORN-265] Integration with Spark3.0 cast class exception of ShuffleHandler
2023-02-02 11:52:51 +08:00
Ethan Feng
a43e3141bc
[CELEBORN-224][FOLLOWUP] Correct license and notices. (#1189) 2023-02-02 10:52:11 +08:00
zhongqiangczq
ff17a61ec5
[CELEBORN-243] fix bug that os's disk usage is low but celeborn thinks that it's high_disk_usage (#1184) 2023-02-02 10:41:11 +08:00
Rex(Hui) An
021004714b
[CELEBORN-264] InFlight requests should not be expired if it's not pushed yet (#1196) 2023-02-01 22:16:55 +08:00
Rex(Hui) An
e23f5ac679
[CELEBORN-258][FOLLOW UP] sbin/restart-worker.sh should also import the sbin/celeborn-config.sh
Co-authored-by: Hui An <hui.an@shopee.com>
2023-02-01 17:28:20 +08:00
Angerszhuuuu
98a5a3e16e
[CELEBORN-257][IMPROVEMENT] Avoid one hash searching when process message in TransportResponseHandler (#1193) 2023-02-01 14:59:53 +08:00
Angerszhuuuu
2577f09938
[CELEBORN-259] Correct wrong comment in restart.sh (#1194) 2023-02-01 14:54:13 +08:00
Rex(Hui) An
0f97fbf38d
[CELEBORN-258] sbin/restart-worker.sh should respect CELEBORN_WORKER_MEMORY and CELEBORN_WORKER_OFFHEAP_MEMORY (#1192)
Co-authored-by: Hui An <hui.an@shopee.com>
2023-02-01 14:36:31 +08:00
Angerszhuuuu
9ce48a648f
[CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs (#1190) 2023-02-01 11:12:16 +08:00
Keyong Zhou
a67a275609
Update README.md 2023-02-01 10:46:55 +08:00
Binjie Yang
da061b6c82
[HELM] Improve master/worker statefulset security context (#1187) 2023-01-31 19:53:09 +08:00
Cheng Pan
799e13d450
[CELEBORN-171][FOLLOWUP] Auto activation jdk-8 profile (#1191) 2023-01-31 19:51:33 +08:00
Shuang
7162be2fae
[CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker (#1149) 2023-01-31 18:53:36 +08:00
Keyong Zhou
54cf2e18d8
[CELEBORN-252] Delete slides (#1186) 2023-01-31 16:35:23 +08:00
Rex(Hui) An
6e82e7dd6c
[CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA (#1188) 2023-01-31 15:39:58 +08:00
Angerszhuuuu
1311fb53d1
[CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should have their own ERROR type (#1181)
* [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type
2023-01-30 17:47:22 +08:00
Angerszhuuuu
122da47815
[CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout (#1179) 2023-01-30 11:57:07 +08:00
Kaijie Chen
3da338a716
[CELEBORN-248] Non-ASCII characters in source code (#1183) 2023-01-29 21:07:41 +08:00
Angerszhuuuu
8611a64400
[CELEBORN-237][IMPROVEMENT] push failed error message should show partition info (#1178)
* [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info
2023-01-28 18:41:54 +08:00
nafiy
d6d537df93
[CELEBORN-229][FOLLOWUP] Support collect metrics with customized labels (#1174) 2023-01-28 16:02:58 +08:00
Keyong Zhou
e47f1e33b0
[CELEBORN-55][FOLLOWUP] Code refine (#1175) 2023-01-20 16:22:47 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker (#1102) 2023-01-20 10:18:45 +08:00
nafiy
e09b629da2
[CELEBORN-229][FEATURE] Support collect metrics with customized labels (#1173) 2023-01-19 11:59:48 +08:00
Keyong Zhou
dfa81c92df
[CELEBORN-224] Correct LICENSE and NOTICE. (#1164) (#1170) 2023-01-18 19:47:42 +08:00
Kaijie Chen
2b6822e3c7
[CELEBORN-230] AppDiskUsageSnapShot overrides equals() without override hashCode() (#1172) 2023-01-18 17:21:32 +08:00