Commit Graph

659 Commits

Author SHA1 Message Date
Ethan Feng
3aacede5f8
[CELEBORN-283] Derive network layer for flink plugin. (#1222) 2023-02-17 14:12:54 +08:00
zhongqiangchen
5236df68af
[CELEBORN-292] optimize mappartitionfilewriter flushing index and reading data header (#1225) 2023-02-17 13:42:28 +08:00
zhongqiangchen
79096d60d0
[CELEBORN-293] WorkerSource registers timer for mappartition message metrics (#1226) 2023-02-17 11:29:54 +08:00
Shuang
b7ef9cf216
[CELEBORN-297] don't cache file groups for map partition shuffle type (#1237) 2023-02-17 11:28:47 +08:00
Ethan Feng
1dcfdb0c8f
[CELEBORN-281] Add metrics about buffer stream read buffer. (#1216) 2023-02-17 11:20:07 +08:00
Keyong Zhou
89b4eab3b6
[CELEBORN-309] Fix some potential concurrent issues in InFlightRequestTracker (#1243) 2023-02-17 10:01:19 +08:00
Angerszhuuuu
57f775a7e9
[CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence (#1208) 2023-02-16 18:27:37 +08:00
Ethan Feng
a364fb27b2
[CELEBORN-282] Add BacklogAnnouncement RPC. (#1217) 2023-02-16 14:58:39 +08:00
Ethan Feng
534853bf8a
[CELEBORN-278] Add openStreamWithCredit RPC. (#1214) 2023-02-16 14:07:13 +08:00
zhongqiangchen
2c508dae0f
[CELEBORN-307] fix ArrayComparisonFailure while running lz4 ut (#1241) 2023-02-16 13:41:17 +08:00
jiaoqingbo
318157e3e9
[CELEBORN-305] Change the parameter passed in the registerShuffle method to numPartitions instead of numMappers (#1240) 2023-02-15 17:35:43 +08:00
jiaoqingbo
bd9e0ddc1f
[CELEBORN-304] Missing setIfMissing celeborn.$module.io.serverThreads (#1238) 2023-02-15 15:49:08 +08:00
Shuang
75c83093f2
[CELEBORN-296] fix map partition commit using wrong partitionId and result (#1233) 2023-02-14 20:54:06 +08:00
Ethan Feng
d391e7d91d
[CELEBORN-300] fix a bug about non-leader master try to update partition size. (#1235) 2023-02-14 15:44:20 +08:00
zhongqiangchen
4fb5b3d547
[CELEBORN-298] Fix the wrong configuration name in readme and conf.template (#1234) 2023-02-14 13:38:03 +08:00
Rex(Hui) An
2068e6ae37
[CELEBORN-279] Add user level push data speed metric (#1213) 2023-02-13 12:04:44 +08:00
Ethan Feng
19bcea217e
[CELEBORN-286] Fix source package contains license-binary. (#1219) 2023-02-13 11:44:14 +08:00
jiaoqingbo
3a92b0d911
[CELEBORN-284] fix typo in CelebornConf (#1218)
Co-authored-by: jiaoqb <jiaoqb@asiainfo.com>
2023-02-10 14:59:36 +08:00
Rex(Hui) An
adb6592d31
[CELEBORN-277] PushDataHandle callback could miss soft split status (#1212) 2023-02-09 14:57:18 +08:00
Angerszhuuuu
dae58a664c
[CELEBORN-239][FOLLOWUP] PUSH_DATA_TIMEOUT_MASTER/SLAVE should support convert through RPC (#1211)
* [CELEBORN-239][FOLLOWUP] PUSH_DATA_TIMEOUT_MASTER/SLAVE should support convert through  RP
2023-02-08 17:16:29 +08:00
Rex(Hui) An
f88f5fcf55
[CELEBORN-207][FOLLOW_UP] Master could miss the congestion status if enable push.data.replicate 2023-02-07 22:57:39 +08:00
Rex(Hui) An
cfe81969c9
[CELEBORN-275] WrappedCallback should only handle response from replica (#1209) 2023-02-07 18:18:13 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed (#1167) 2023-02-07 14:24:30 +08:00
Rex(Hui) An
bb113ec9be
[CELEBORN-207] Support network congestion control (#1066) 2023-02-07 12:06:18 +08:00
Angerszhuuuu
ae32c702b6
[CELEBORN-247][FOLLOWUP] Fix NPE issue (#1207) 2023-02-06 17:28:05 +08:00
Angerszhuuuu
46240d59de
[CELEBORN-247][FOLLOWUP] Add metrics for each user's quota usage (#1206) 2023-02-06 14:02:57 +08:00
Angerszhuuuu
c4020100db
[CELEBORN-271][BUG] PushState in PushDataHandler should should use peer's location 2023-02-06 11:31:57 +08:00
Angerszhuuuu
ecc3a0e52f
[CELEBORN-272][BUG] Don't do replication should directly use callback not wrappedCallback (#1205) 2023-02-06 11:28:12 +08:00
zhongqiangchen
8e903840af [CELEBORN-243][REWORK]fix bug that os's disk usage is low but celeborn thinks that it's high_disk_usage (#1202) 2023-02-04 14:27:44 +08:00
Angerszhuuuu
2e68912812
[CELEBORN-269][BUG] Disable replication throw NPE when removeBatch in pushDataHandler (#1203) 2023-02-03 20:06:59 +08:00
Angerszhuuuu
ff683ffc91
[CELEBORN-238][IMPROVEMENT] Revive caused by PUSH_DATA_TIMEOUT_MASTER and PUSH_DATA_TIMEOUT_SLAVE should add corresponding worker into blacklist (#1180) 2023-02-03 17:47:24 +08:00
Shuang
2634476758
[CELEBORN-267] reuse stream when client channel reconnected (#1200) 2023-02-03 15:12:45 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too (#1185) 2023-02-03 11:53:15 +08:00
Angerszhuuuu
04427f2b16
[CELEBORN-247] Add metrics for each user's quota usage (#1182) 2023-02-02 18:31:08 +08:00
Angerszhuuuu
ced08a1d89
[CELEBORN-266] Fix wrong old version configurations (#1198) 2023-02-02 14:03:45 +08:00
Angerszhuuuu
c410392284
[CELEBORN-265] Integration with Spark3.0 cast class exception of ShuffleHandler (#1197)
* [CELEBORN-265] Integration with Spark3.0 cast class exception of ShuffleHandler
2023-02-02 11:52:51 +08:00
Ethan Feng
a43e3141bc
[CELEBORN-224][FOLLOWUP] Correct license and notices. (#1189) 2023-02-02 10:52:11 +08:00
zhongqiangczq
ff17a61ec5
[CELEBORN-243] fix bug that os's disk usage is low but celeborn thinks that it's high_disk_usage (#1184) 2023-02-02 10:41:11 +08:00
Rex(Hui) An
021004714b
[CELEBORN-264] InFlight requests should not be expired if it's not pushed yet (#1196) 2023-02-01 22:16:55 +08:00
Rex(Hui) An
e23f5ac679
[CELEBORN-258][FOLLOW UP] sbin/restart-worker.sh should also import the sbin/celeborn-config.sh
Co-authored-by: Hui An <hui.an@shopee.com>
2023-02-01 17:28:20 +08:00
Angerszhuuuu
98a5a3e16e
[CELEBORN-257][IMPROVEMENT] Avoid one hash searching when process message in TransportResponseHandler (#1193) 2023-02-01 14:59:53 +08:00
Angerszhuuuu
2577f09938
[CELEBORN-259] Correct wrong comment in restart.sh (#1194) 2023-02-01 14:54:13 +08:00
Rex(Hui) An
0f97fbf38d
[CELEBORN-258] sbin/restart-worker.sh should respect CELEBORN_WORKER_MEMORY and CELEBORN_WORKER_OFFHEAP_MEMORY (#1192)
Co-authored-by: Hui An <hui.an@shopee.com>
2023-02-01 14:36:31 +08:00
Angerszhuuuu
9ce48a648f
[CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs (#1190) 2023-02-01 11:12:16 +08:00
Keyong Zhou
a67a275609
Update README.md 2023-02-01 10:46:55 +08:00
Binjie Yang
da061b6c82
[HELM] Improve master/worker statefulset security context (#1187) 2023-01-31 19:53:09 +08:00
Cheng Pan
799e13d450
[CELEBORN-171][FOLLOWUP] Auto activation jdk-8 profile (#1191) 2023-01-31 19:51:33 +08:00
Shuang
7162be2fae
[CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker (#1149) 2023-01-31 18:53:36 +08:00
Keyong Zhou
54cf2e18d8
[CELEBORN-252] Delete slides (#1186) 2023-01-31 16:35:23 +08:00
Rex(Hui) An
6e82e7dd6c
[CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA (#1188) 2023-01-31 15:39:58 +08:00