Commit Graph

312 Commits

Author SHA1 Message Date
Shuang
935806f036
[CELEBORN-341][Flink] cache file group for map partition in Flink plugin (#1277) 2023-02-26 20:31:20 +08:00
Angerszhuuuu
a7587c3fe7
[CELEBORN-337] Remove unnecessary StatusCode.message (#1272)
* [CELEBORN-337] Remove unnecessary StatusCode.message
2023-02-24 15:11:07 +08:00
Angerszhuuuu
81f7ffd767
[CELEBORN-332] Unify the log of ShuffleClientImpl (#1267)
* [CELEBORN-332] Unify the log of ShuffleClientImpl
2023-02-24 14:07:25 +08:00
Angerszhuuuu
3067efcfd3
[CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too (#1266)
* [CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too
2023-02-23 14:57:11 +08:00
Angerszhuuuu
f7948190cf
[CELEBORN-316][FOLLOWUP] Should not wrap CelebornIOException with CelebornIOException (#1264) 2023-02-23 11:48:46 +08:00
Angerszhuuuu
1132cc25ab
[CELEBORN-328][MPROVEMENT] Too much noisy log when reserve slot failed (#1262) 2023-02-22 17:19:52 +08:00
Angerszhuuuu
322f0d2b41
[CELEBORN-316] Wrap Celeborn exception with CelebornIOException (#1253) 2023-02-22 16:10:11 +08:00
Shuang
3da615972e
[CELEBORN-326)] [Flink] lifecycleManager supports flink-yarn-session mode to handle multiple Flink jobs. (#1260) 2023-02-22 15:37:24 +08:00
Angerszhuuuu
251b923b5b
[CELEBORN-321] When register shuffle failed, DataPushQueue should directly take the task queue to avoid NPE (#1258) 2023-02-21 17:02:37 +08:00
Shuang
61065230bd
[CELEBORN-311] not retry when register for map partition occurs exception (#1246) 2023-02-21 10:16:10 +08:00
Ethan Feng
bfb39632d9
[CELEBORN-235] Implement flink plugin. (#1244) 2023-02-17 19:31:12 +08:00
zhongqiangchen
b5dc106af8
[CELEBORN-291] optimize shuffleclientimpl creating client and pushdata for mappartition (#1224) 2023-02-17 19:07:19 +08:00
Shuang
b7ef9cf216
[CELEBORN-297] don't cache file groups for map partition shuffle type (#1237) 2023-02-17 11:28:47 +08:00
Angerszhuuuu
57f775a7e9
[CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence (#1208) 2023-02-16 18:27:37 +08:00
jiaoqingbo
318157e3e9
[CELEBORN-305] Change the parameter passed in the registerShuffle method to numPartitions instead of numMappers (#1240) 2023-02-15 17:35:43 +08:00
jiaoqingbo
bd9e0ddc1f
[CELEBORN-304] Missing setIfMissing celeborn.$module.io.serverThreads (#1238) 2023-02-15 15:49:08 +08:00
Shuang
75c83093f2
[CELEBORN-296] fix map partition commit using wrong partitionId and result (#1233) 2023-02-14 20:54:06 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed (#1167) 2023-02-07 14:24:30 +08:00
Angerszhuuuu
ff683ffc91
[CELEBORN-238][IMPROVEMENT] Revive caused by PUSH_DATA_TIMEOUT_MASTER and PUSH_DATA_TIMEOUT_SLAVE should add corresponding worker into blacklist (#1180) 2023-02-03 17:47:24 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too (#1185) 2023-02-03 11:53:15 +08:00
Rex(Hui) An
021004714b
[CELEBORN-264] InFlight requests should not be expired if it's not pushed yet (#1196) 2023-02-01 22:16:55 +08:00
Shuang
7162be2fae
[CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker (#1149) 2023-01-31 18:53:36 +08:00
Angerszhuuuu
1311fb53d1
[CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should have their own ERROR type (#1181)
* [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type
2023-01-30 17:47:22 +08:00
Angerszhuuuu
8611a64400
[CELEBORN-237][IMPROVEMENT] push failed error message should show partition info (#1178)
* [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info
2023-01-28 18:41:54 +08:00
Keyong Zhou
e47f1e33b0
[CELEBORN-55][FOLLOWUP] Code refine (#1175) 2023-01-20 16:22:47 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker (#1102) 2023-01-20 10:18:45 +08:00
zhongqiangczq
1836fe187b
[CELEBORN-197] in mappartition, check transportClient whether changed while sending messages (#1145) 2023-01-13 16:45:26 +08:00
Shuang
810a8d01e0
[CELEBORN-212] refresh client if current client is inactive. (#1159) 2023-01-11 11:54:50 +08:00
Shuang
1332362bff
[CELEBORN-213] Add configuration for whether to close idle connections in client side (#1157) 2023-01-10 19:13:33 +08:00
Angerszhuuuu
e155ec122a
[CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice (#1136) 2023-01-10 11:39:40 +08:00
Shuang
2ec06472fe
[CELEBORN-203] fix NPE when removeExpiredShuffle in LifecycleManager. (#1151) 2023-01-06 18:32:17 +08:00
Angerszhuuuu
0d5809ff0c
[CELEBORN-192][IMPROVEMENT] Change FAILED status to REQUEST_FAILED since it's all used when RPC request failed. (#1139) 2023-01-06 16:53:04 +08:00
Shuang
3b2be25a50
[CELEBORN-173] refactor minicluster and fix ut (#1147) 2023-01-05 20:39:19 +08:00
Angerszhuuuu
415452d9c4
[CELEBORN-189][IMPROVEMENT] PushDataFailedSlave should add slave worker to blacklist (#1135) 2023-01-05 20:12:07 +08:00
Angerszhuuuu
fe8dfb05f3
[CELEBORN-196][REFACTOR] Rename batchHandleRequestPartitions to handleRequestPartitions (#1144) 2023-01-05 14:37:10 +08:00
Angerszhuuuu
2315f2f988
[CELEBORN-191][BUG] ShuffleClient registerShuffle return RESERVE_SLOTS_FAILED should also been print out (#1138) 2023-01-03 17:13:31 +08:00
Shuang
5cba307189
[CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup (#1116) 2022-12-30 18:15:23 +08:00
Cheng Pan
b8758a7cb6
[CELEBORN-181][TEST] Rename RssFunSuite to CelebornFunSuite (#1125) 2022-12-29 18:10:14 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start (#1119) 2022-12-29 12:07:20 +08:00
Binjie Yang
63943cd5cc
[CELEBORN-147][IT]Extraction of common integration test cases (#1092) 2022-12-29 12:03:09 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata (#1097) 2022-12-20 20:40:42 +08:00
Keyong Zhou
a2dd72f20c
[CELEBORN-155] Wrong TimeUnit for registerShuffleRetryWait in Shuffle… (#1099) 2022-12-19 17:32:18 +08:00
Shuang
13769f0f0a
[CELEBORN-121] Refactor batchHandleCommitPartition (#1089) 2022-12-19 12:39:39 +08:00
Ethan Feng
39394526a8
[CELEBORN-142]Keep committed partition locations semantic consistent when commit files on HDFS. (#1091) 2022-12-16 19:02:02 +08:00
nafiy
ddab27a1d7
[CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse (#1093)
* [CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse
2022-12-15 18:16:34 +08:00
Ethan Feng
65cb36c002
[CELEBORN-83][FOLLOWUP] Fix various bugs when using HDFS as storage. (#1065) 2022-12-15 15:20:29 +08:00
Shuang
e3576e4e7a
[CELEBORN-117] refactor CommitManager, implements M/R Partition Commi… (#1060) 2022-12-15 11:09:59 +08:00
Cheng Pan
ec371c0026
[CELEBORN-132] ShuffleClient should not implement Cloneable (#1077) 2022-12-14 10:04:39 +08:00
Angerszhuuuu
c924a4ff0d
[CELEBORN-61][CELEBORN-62][FEATURE] Shuffle client support slow start, congestion avoidance and congestion control (#1052) 2022-12-08 12:41:34 +08:00
zhongqiangczq
60f6f87832
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write:pushdata (#1036) 2022-12-08 12:31:47 +08:00
zhongqiangczq
d3d40f730c
[CELEBORN-106] flink-plugin supports shufflewrite:OutputGate (#1051) 2022-12-08 11:24:37 +08:00
Shuang
e2196e9383
[CELEBORN-56] [ISSUE-945] handle map partition mapper end (#1003) 2022-12-07 21:09:02 +08:00
Shuang
f3f104870c
[CELEBORN-75] Initialize flink plugin module (#1027) 2022-12-07 15:53:00 +08:00
Angerszhuuuu
0d38bad78a
[CELEBORN-20][REFACTOR] Extract CommitManager from LifecycleManager (#1050) 2022-12-06 22:26:18 +08:00
Angerszhuuuu
1e4dec96b9
[CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager (#1024)
* [CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager
2022-12-05 17:05:17 +08:00
Angerszhuuuu
5eaad136a0
[CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason (#1043)
* [CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason
2022-12-05 14:02:33 +08:00
nafiy
8e384cda5a
[CELEBORN-88][REFACTOR] Revive/PartitionSplit should set separated timeout configuration (#1046) 2022-12-05 10:36:43 +08:00
nafiy
44d45c2a27
[CELEBORN-90][REFACTOR] GetReducerFileGroup should support separated timeout configuration (#1045) 2022-12-02 22:53:51 +08:00
Shuang
3a4c3c03a0
[CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem (#1034)
* [CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem
2022-12-02 16:25:59 +08:00
nafiy
13e1e24035
[CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration (#1031)
* [CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration
2022-12-01 18:39:56 +08:00
zhongqiangczq
898d1126a6
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write: send handshake/regionstart/regionfinish (#1035) 2022-12-01 11:20:55 +08:00
RexAn
bb5a4d2180
[CELEBORN-63] Add CONGESTION related status codes (#1028)
* Increase push data return reason types such as CONGESTION ect
2022-12-01 10:55:37 +08:00
Angerszhuuuu
7f8e66afbc
CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end (#1030)
* CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end
2022-11-30 19:42:04 +08:00
Ethan Feng
dd02070e4b
[CELEBORN-83] Fix various bug when using HDFS as storage.
1. fix incompatibility between Hadoop 2 and Hadoop 3.
2. fix hdfs writer will never be called when there are no healthy disks.
3. fix an NPE when HDFS file writer close.
2022-11-30 19:33:18 +08:00
Angerszhuuuu
5ad4415c68
[CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager (#1021)
* [CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager
2022-11-29 19:14:55 +08:00
Angerszhuuuu
01dc9d4259
[CELEBORN-79][REFACTOR] Remove unused responseCheckerThread from LifecycleManager (#1022) 2022-11-29 15:25:37 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end 2022-11-29 13:09:01 +08:00
Angerszhuuuu
13f4ce2be6
[CELEBORN-68][FOLLOWUP] Retry on same partition location should have a retry wait interval (#1017) 2022-11-28 20:17:08 +08:00
Keyong Zhou
d381df71f8
[CELEBORN-70] Add epoch for each commitFiles request (#1012) 2022-11-27 21:05:14 +08:00
nafiy
817eee969f
[CELEBORN-58][REFACTOR] Aggregate reserve failed logs together (#1005) 2022-11-26 20:56:39 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request (#1011) 2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk (#1010) 2022-11-26 18:06:06 +08:00
Ethan Feng
93dbf3f8b1
[CELEBORN-67] Revert "Fix fetch incorrect data chunk" related commits (#1006)
* Revert "[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)"

This reverts commit 1e8f6dc5e8.

* Revert "[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000)"

This reverts commit f1c4d675d6.

* Revert "[CELEBORN-49] Deadlock when kill worker in shuffle read (#998)"

This reverts commit 0be4b3399c.

* Revert "[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995)"

This reverts commit 2b05228871.

* Revert "[BUG] Fix fetch incorrect data chunk (#926)"

This reverts commit 6f043f8a

* Revert "[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954)"

This reverts commit 64e8ebf1
2022-11-25 20:57:47 +08:00
nafiy
fe13e9e261
[CELEBORN-59][REFACTOR] Support send destroy slots request in parallel (#1004) 2022-11-25 18:26:05 +08:00
Angerszhuuuu
1e8f6dc5e8
[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)
* [CELEBORN-48][BUG] Channel inactive may cause new client use old stream id to fetch data
2022-11-23 18:22:06 +08:00
Ethan Feng
f1c4d675d6
[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000) 2022-11-23 18:07:57 +08:00
Keyong Zhou
0be4b3399c
[CELEBORN-49] Deadlock when kill worker in shuffle read (#998) 2022-11-23 17:31:05 +08:00
Angerszhuuuu
2b05228871
[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995) 2022-11-23 11:56:10 +08:00
Keyong Zhou
cfc1fa15bd
[CELEBORN-46] Refine log for RssInputStream.close() (#994) 2022-11-22 22:01:08 +08:00
Shuang
1656458788
[CELEBORN-14] [ISSUE-955] support register attempt map task (#984) 2022-11-22 15:23:20 +08:00
Angerszhuuuu
5ec278f99a
[ISSUE-987][FEATURE] During worker shutdown, return HARD_SPLIT for all existed partition (#988) 2022-11-22 14:29:55 +08:00
Shuang
fb6d1de108
[CELEBORN-8] [ISSUE-952][FEATURE] support register shuffle task in map partition mode (#973) 2022-11-16 21:46:19 +08:00
Angerszhuuuu
64e8ebf158
[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954) 2022-11-11 14:00:47 +08:00
leesf
0b8376e2c7
Cleanup some code (#943) 2022-11-11 13:58:39 +08:00
Ethan Feng
6f043f8ae9
[BUG] Fix fetch incorrect data chunk (#926) 2022-11-09 22:31:39 +08:00
leesf
3699683a3b
Fix and migrate some configs (#927) 2022-11-07 09:41:38 +08:00
Angerszhuuuu
38e15d89e6
[ISSUE-902][IMPROVEMENT][FOLLOWUP] LifecycleManager should reserve blacklist with irrecoverable status (#914) 2022-11-04 15:54:45 +08:00
Angerszhuuuu
e68ca75a9e
[ISSUE-902][BUG] LifecycleManager should not reallocate slots in failed worker during retry (#906) 2022-11-02 21:07:28 +08:00
leesf
f1694f3d20
[MINOR][CLEANUP] clean up some code in LifecycleManager and ShuffleClientImpl (#896) 2022-11-01 11:40:19 +08:00
Angerszhuuuu
87fcfa767f
[ISSUE-887][REFACTOR] Configuration type convert to Enum (#888)
* [ISSUE-332][FOLLOWUP] Add deps in worker's pom

* [Refactor] Modify package name of utils to keep consistence

* [Refactor] Modify package name of utils to keep consistence

* [REFACTOR] Remove unused isRegistered in controller

* [ISSUE-887][REFACTOR] Configuration type convert to Enum

* update

* update

* Update RssShuffleManager.java
2022-10-29 13:41:06 +08:00
Cheng Pan
d7be6006e7
Migrate network related conf to structured conf system (#875)
* Migrate network related conf to structured conf system

* migrate

* fix

* fix

* worker

* fix

* nit

* review

* nit
2022-10-28 10:45:52 +08:00
Angerszhuuuu
f9ecde3b2b
[ISSUE-863][BUG]LifecycleManager should ignore change partition request when shuffle ended and not remove workersnapshot when commit success (#864) 2022-10-27 22:04:18 +08:00
Ethan Feng
8800fc4a8e
[Refactor] Refine rpc cache configs (#853)
* refine rpc cache configs.

* update.

* update.

* update.
2022-10-25 20:28:18 +08:00
Ethan Feng
45ef716737
[Feature] Cache GetReducerFileGroupResponse to avoid lifecycle manager oom. (#792) 2022-10-25 16:16:44 +08:00
AngersZhuuuu
2ebf873b3c
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System (#846)
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System
2022-10-25 10:54:45 +08:00
AngersZhuuuu
0bd0a3e9f4
[ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System (#848)
* [ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System

* Update CelebornConf.scala

* follow comments

* update

* update

* update

* Update client.md
2022-10-25 09:16:46 +08:00
AngersZhuuuu
0fdb19065a
[ISSUE-841][REFACTOR] Migrate shuffle client side conf to Celeborn Configuration System (#842) 2022-10-24 20:48:48 +08:00
Keyong Zhou
63752e7a37
[BUG] RegisterShuffle should not increase epoch (#833) 2022-10-23 23:40:32 +08:00
nafiy
d0058fb2c5
[ISSUE-780][REFACTOR] Refactor PartitionLocation's methods (#791) 2022-10-22 22:46:45 +08:00
AngersZhuuuu
f2610e3b6f
[ISSUE-829][REFACTOR] Unify name of PUSH_DATA_FAIL_MAIN (#830) 2022-10-21 19:06:33 +08:00
AngersZhuuuu
a773c8e6db
[ISSUE-820][Refactor] Rename RssConf to CelebornConf (#826) 2022-10-20 20:13:13 +08:00
AngersZhuuuu
8344479df1
[ISSUE-818][REFACTOR] Move existing RssConf.xxx conf method to RssConf class (#822)
* [ISSUE-818][REFACTOR] Move existing RssConf.xxx conf method to RssConf class


Co-authored-by: Ethan Feng <ethan.aquarius.fmx@gmail.com>
2022-10-20 18:10:59 +08:00
Ethan Feng
5c761a8df3
[ISSUE-813][Refactor] Refactor flusher configurations. (#813)
* Refactor flusher configurations.

* Refactor flusher configurations.

* Update.

* remove brackets.

* update docs.

* rename.

* update.

* update docs.

* update.

* update.

* update.

* update.

* update.

* update.

* update.

* format.

* update.

* update.
2022-10-20 15:23:17 +08:00
nafiy
a75bce905e
[ISSUE-805][REFACTOR] Remove UserIdentifier out of ControlMessage (#808) 2022-10-19 15:32:53 +08:00
AngersZhuuuu
7fedaaeca1
[ISSUE-795][BUG] Batch handle change partition throw NPE (#796) 2022-10-19 10:54:08 +08:00
Ethan Feng
bff2a7065b
Keep one copy of roaringbitmap to reduce memory usage. (#790) 2022-10-18 13:26:49 +08:00
Cheng Pan
efad4abb5d
Migrate a bunch of configurations (#786) 2022-10-18 10:44:01 +08:00
Cheng Pan
ea67f4e060
Introduce categories to ConfigEntry and migrate configurations (#775) 2022-10-17 16:56:54 +08:00
Cheng Pan
96e969f46e
[BUILD] Extract project.version to Maven Property (#772) 2022-10-16 19:01:40 +08:00
AngersZhuuuu
c9b462dc02
[ISSUE-770][Refactor] Batch handle change partition should ignore empty batch and avoid print log of empty result (#771) 2022-10-14 21:49:37 +08:00
AngersZhuuuu
3bad403c8b
[ISSUE-768][REFACTOR] Shuffle data lost should show more clear about lost data in which worker (#769) 2022-10-14 11:41:15 +08:00
Cheng Pan
f01a696313
Migrate and refactor configuration for master endpoints (#752) 2022-10-11 21:33:21 +08:00
AngersZhuuuu
bbb4f8e225
[ISSUE-306][IMPROVEMENT] Handle change partition request in batch (#622) 2022-10-10 18:31:37 +08:00
AngersZhuuuu
f2a234f870
[ISSUE-739][REFACTOR] Use object wrap pb message method (#740) 2022-10-09 11:53:48 +08:00
AngersZhuuuu
ae4bb12d5e
[ISSUE-630][REFACTOR] Minor change of storage resource quota, include code style, comment unused code etc.. (#728) 2022-10-08 20:15:25 +08:00
Ethan Feng
96e550f81c
Fix a npe that stuck lifecycle manager when a worker is offline. (#733) 2022-10-08 20:11:42 +08:00
Ethan Feng
6deda248ac
[REFACTOR]move lifecycle manager to correct package. (#730) 2022-10-08 18:14:08 +08:00
Cheng Pan
ab16b4f101
[INFRA] Rename modules w/ celeborn prefix (#723) 2022-10-08 08:05:57 +08:00
Cheng Pan
abb4ce6405
Drop control message Scala wrapper - Revive/PartitionSplit/ChangeLocationResponse (#720) 2022-10-07 12:40:23 +08:00
Cheng Pan
a719709a17
Drop control message Scala wrapper - UnregisterShuffle/UnregisterShuffleResponse (#718) 2022-10-07 12:29:10 +08:00
Cheng Pan
cda133e11f
Drop control message Scala wrapper - RegisterShuffle/RegisterShuffleResponse (#716) 2022-10-06 23:37:36 +08:00
Keyong Zhou
a2d2379153
[DOC] Replace RSS with Celeborn in docs (#715) 2022-10-06 10:37:46 +08:00
Cheng Pan
4880d78d6a
Extract spark tests and improve pom (#711) 2022-10-04 10:23:26 +08:00
Keyong Zhou
fe3b5988f2
[REFACTOR] Change package name to org.apache.celeborn (#710) 2022-10-02 18:10:29 +08:00
nafiy
5d4533fb85
[ISSUE-632][FEATURE] LifecycleManager side ReserveSlots & RequestSlots RPC with UserIdentifier (#679) 2022-09-27 00:01:44 +08:00
zky.zhoukeyong
a2522745d2 Revert "Drop control message Scala wrapper - RemoveExpiredShuffle (#676)"
This reverts commit a160cd90cb.
2022-09-25 17:18:41 +08:00
Cheng Pan
a160cd90cb
Drop control message Scala wrapper - RemoveExpiredShuffle (#676) 2022-09-24 23:23:36 +08:00
Ethan Feng
30d4323cdb
[FEATURE] Add a configuration to enable a map id filter mechanism. #662 (#663) 2022-09-23 18:38:52 +08:00
Ethan Feng
4a7a7d42b5
[FEATURE] Add metrics about fetch chunk size, commit files time and get reducer file time (#661) 2022-09-23 16:05:28 +08:00
Ethan Feng
b4654d788c
[ISSUE-607]Add map ids info for each PartitionLocation to enable filtering for m… (#619) 2022-09-23 15:21:41 +08:00
AngersZhuuuu
a6b8af2b00
[ISSUE-637][FEATURE] Change CheckAlive to CheckAvailable and reply checkQuota result (#658) 2022-09-22 21:54:45 +08:00
AngersZhuuuu
df5ba55ea5
[ISSUE-633][FEATURE] Support provider user identity by customized class and keep LifecycleManager and ShuffleClient user identity consistence (#646) 2022-09-21 17:35:59 +08:00
Ethan Feng
3c917c577b
Fix worker replied ack at the wrong time when a soft split is triggered. (#645) 2022-09-21 15:07:21 +08:00
Cheng Pan
b51abeed96
Improve code smell (#624) 2022-09-20 10:03:02 +08:00
Keyong Zhou
30a5afb816
[ISSUE-625][BUG] Incorrect result when kill worker while pushMergedData (#627) 2022-09-20 00:05:15 +08:00
AngersZhuuuu
e48efb2e1c
[ISSUE-611][BUG] FetchHandler should handle PartitionFileSorter return null and we should enable retry for sorter exception (#615) 2022-09-19 14:51:46 +08:00
nafiy
75ca396e77
[ISSUE-600][Refactor] Translate Chinese comments to English (#605) 2022-09-15 22:24:39 +08:00
Keyong Zhou
0dc7e82006
improve revive log readability. (#603) (#604)
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
2022-09-14 23:25:49 +08:00
AngersZhuuuu
a6acaa11e0
[ISSUE-597][REFACTOR] Unify Enum type name and correct wrong UN_KOWN (#598) 2022-09-13 19:07:48 +08:00
nafiy
01d138bea4
[ISSUE-578][FEATURE] Add unit test for codec (#586) 2022-09-11 17:08:45 +08:00
Keyong Zhou
e0c4779fac
[ISSUE-591][BUG] Incorrect result when revive and split happen concur… (#592) 2022-09-10 23:30:39 +08:00
Keyong Zhou
1d7fec84da
[ISSUE-588][BUG] Fix memory leak in shuffle read (#589) 2022-09-10 22:07:13 +08:00
nafiy
0a60b21b56
[ISSUE-551][BUG] CompressionMethod and checksum are not consistent when zstd level is negative (#577) 2022-09-10 21:39:51 +08:00
Keyong Zhou
a2cd01b8ef
[ISSUE-567][FOLLOW-UP] remove entry from latestPartitionLocation in removeExpiredShuffle (#575) 2022-09-08 11:21:42 +08:00
AngersZhuuuu
da7ac1721b
[ISSUE-565][REFACTOR] Unify RPC name HeartbeatXxxxx (#566) 2022-09-07 21:33:18 +08:00
Keyong Zhou
f0b6346c9f
[ISSUE-567] Optimize LifecycleManager.getLatestPartition (#570) 2022-09-07 21:06:49 +08:00
nafiy
644471debb
[ISSUE-516][FEATURE] Worker should clean remaining directory when start before registering to Master (#540) 2022-09-06 23:37:47 +08:00
AngersZhuuuu
35d5b587ec
[Refactor] Modify package name of utils to keep consistence (#536) 2022-09-05 20:06:54 +08:00
AngersZhuuuu
f7211204f2
[ISSUE-534][REFACTOR] Refactor log when call handleGetReducerFileGroup (#535) 2022-09-05 19:48:57 +08:00
Cheng Pan
4b42219595
Remove log4j1 (#501) 2022-09-05 19:30:15 +08:00