Commit Graph

211 Commits

Author SHA1 Message Date
Angerszhuuuu
786fcd6744
[CELEBORN-336] Revive Failed should use keep the corresponding StatusCode (#1283)
* [CELEBORN-336] Revive Failed should use keep the corresponding StatusCode
2023-03-01 18:57:51 +08:00
Shuang
bc7da3154f
[CELEBORN-354][Flink] fix succeedPartitionIds may contain new added partitionIds (#1289) 2023-03-01 15:45:24 +08:00
Angerszhuuuu
eda21ead24
[CELEBORN-344] Change PUSH_DATA_FAIL_MASTER/SALVE to PUSH_DATA_WRITE_FAIL_MASTER/SALVE (#1281) 2023-02-28 11:29:40 +08:00
Keyong Zhou
7adf1fca41
[CELEBORN-295] Optimize data push (#1232)
* [CELEBORN-295] Add double buffer for sort pusher
2023-02-28 10:35:55 +08:00
Angerszhuuuu
24f5478adc
[CELEBORN-338] Clean duplicated exception message of handling push data (#1274) 2023-02-28 10:35:18 +08:00
Shuang
935806f036
[CELEBORN-341][Flink] cache file group for map partition in Flink plugin (#1277) 2023-02-26 20:31:20 +08:00
Angerszhuuuu
a7587c3fe7
[CELEBORN-337] Remove unnecessary StatusCode.message (#1272)
* [CELEBORN-337] Remove unnecessary StatusCode.message
2023-02-24 15:11:07 +08:00
Angerszhuuuu
81f7ffd767
[CELEBORN-332] Unify the log of ShuffleClientImpl (#1267)
* [CELEBORN-332] Unify the log of ShuffleClientImpl
2023-02-24 14:07:25 +08:00
Angerszhuuuu
3067efcfd3
[CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too (#1266)
* [CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too
2023-02-23 14:57:11 +08:00
Angerszhuuuu
f7948190cf
[CELEBORN-316][FOLLOWUP] Should not wrap CelebornIOException with CelebornIOException (#1264) 2023-02-23 11:48:46 +08:00
Angerszhuuuu
1132cc25ab
[CELEBORN-328][MPROVEMENT] Too much noisy log when reserve slot failed (#1262) 2023-02-22 17:19:52 +08:00
Angerszhuuuu
322f0d2b41
[CELEBORN-316] Wrap Celeborn exception with CelebornIOException (#1253) 2023-02-22 16:10:11 +08:00
Shuang
3da615972e
[CELEBORN-326)] [Flink] lifecycleManager supports flink-yarn-session mode to handle multiple Flink jobs. (#1260) 2023-02-22 15:37:24 +08:00
Angerszhuuuu
251b923b5b
[CELEBORN-321] When register shuffle failed, DataPushQueue should directly take the task queue to avoid NPE (#1258) 2023-02-21 17:02:37 +08:00
Shuang
61065230bd
[CELEBORN-311] not retry when register for map partition occurs exception (#1246) 2023-02-21 10:16:10 +08:00
Ethan Feng
bfb39632d9
[CELEBORN-235] Implement flink plugin. (#1244) 2023-02-17 19:31:12 +08:00
zhongqiangchen
b5dc106af8
[CELEBORN-291] optimize shuffleclientimpl creating client and pushdata for mappartition (#1224) 2023-02-17 19:07:19 +08:00
Shuang
b7ef9cf216
[CELEBORN-297] don't cache file groups for map partition shuffle type (#1237) 2023-02-17 11:28:47 +08:00
Angerszhuuuu
57f775a7e9
[CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence (#1208) 2023-02-16 18:27:37 +08:00
jiaoqingbo
318157e3e9
[CELEBORN-305] Change the parameter passed in the registerShuffle method to numPartitions instead of numMappers (#1240) 2023-02-15 17:35:43 +08:00
jiaoqingbo
bd9e0ddc1f
[CELEBORN-304] Missing setIfMissing celeborn.$module.io.serverThreads (#1238) 2023-02-15 15:49:08 +08:00
Shuang
75c83093f2
[CELEBORN-296] fix map partition commit using wrong partitionId and result (#1233) 2023-02-14 20:54:06 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed (#1167) 2023-02-07 14:24:30 +08:00
Angerszhuuuu
ff683ffc91
[CELEBORN-238][IMPROVEMENT] Revive caused by PUSH_DATA_TIMEOUT_MASTER and PUSH_DATA_TIMEOUT_SLAVE should add corresponding worker into blacklist (#1180) 2023-02-03 17:47:24 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too (#1185) 2023-02-03 11:53:15 +08:00
Rex(Hui) An
021004714b
[CELEBORN-264] InFlight requests should not be expired if it's not pushed yet (#1196) 2023-02-01 22:16:55 +08:00
Shuang
7162be2fae
[CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker (#1149) 2023-01-31 18:53:36 +08:00
Angerszhuuuu
1311fb53d1
[CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should have their own ERROR type (#1181)
* [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type
2023-01-30 17:47:22 +08:00
Angerszhuuuu
8611a64400
[CELEBORN-237][IMPROVEMENT] push failed error message should show partition info (#1178)
* [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info
2023-01-28 18:41:54 +08:00
Keyong Zhou
e47f1e33b0
[CELEBORN-55][FOLLOWUP] Code refine (#1175) 2023-01-20 16:22:47 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker (#1102) 2023-01-20 10:18:45 +08:00
zhongqiangczq
1836fe187b
[CELEBORN-197] in mappartition, check transportClient whether changed while sending messages (#1145) 2023-01-13 16:45:26 +08:00
Shuang
810a8d01e0
[CELEBORN-212] refresh client if current client is inactive. (#1159) 2023-01-11 11:54:50 +08:00
Shuang
1332362bff
[CELEBORN-213] Add configuration for whether to close idle connections in client side (#1157) 2023-01-10 19:13:33 +08:00
Angerszhuuuu
e155ec122a
[CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice (#1136) 2023-01-10 11:39:40 +08:00
Shuang
2ec06472fe
[CELEBORN-203] fix NPE when removeExpiredShuffle in LifecycleManager. (#1151) 2023-01-06 18:32:17 +08:00
Angerszhuuuu
0d5809ff0c
[CELEBORN-192][IMPROVEMENT] Change FAILED status to REQUEST_FAILED since it's all used when RPC request failed. (#1139) 2023-01-06 16:53:04 +08:00
Shuang
3b2be25a50
[CELEBORN-173] refactor minicluster and fix ut (#1147) 2023-01-05 20:39:19 +08:00
Angerszhuuuu
415452d9c4
[CELEBORN-189][IMPROVEMENT] PushDataFailedSlave should add slave worker to blacklist (#1135) 2023-01-05 20:12:07 +08:00
Angerszhuuuu
fe8dfb05f3
[CELEBORN-196][REFACTOR] Rename batchHandleRequestPartitions to handleRequestPartitions (#1144) 2023-01-05 14:37:10 +08:00
Angerszhuuuu
2315f2f988
[CELEBORN-191][BUG] ShuffleClient registerShuffle return RESERVE_SLOTS_FAILED should also been print out (#1138) 2023-01-03 17:13:31 +08:00
Shuang
5cba307189
[CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup (#1116) 2022-12-30 18:15:23 +08:00
Cheng Pan
b8758a7cb6
[CELEBORN-181][TEST] Rename RssFunSuite to CelebornFunSuite (#1125) 2022-12-29 18:10:14 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start (#1119) 2022-12-29 12:07:20 +08:00
Binjie Yang
63943cd5cc
[CELEBORN-147][IT]Extraction of common integration test cases (#1092) 2022-12-29 12:03:09 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata (#1097) 2022-12-20 20:40:42 +08:00
Keyong Zhou
a2dd72f20c
[CELEBORN-155] Wrong TimeUnit for registerShuffleRetryWait in Shuffle… (#1099) 2022-12-19 17:32:18 +08:00
Shuang
13769f0f0a
[CELEBORN-121] Refactor batchHandleCommitPartition (#1089) 2022-12-19 12:39:39 +08:00
Ethan Feng
39394526a8
[CELEBORN-142]Keep committed partition locations semantic consistent when commit files on HDFS. (#1091) 2022-12-16 19:02:02 +08:00
nafiy
ddab27a1d7
[CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse (#1093)
* [CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse
2022-12-15 18:16:34 +08:00