Shuang
935806f036
[CELEBORN-341][Flink] cache file group for map partition in Flink plugin ( #1277 )
2023-02-26 20:31:20 +08:00
Angerszhuuuu
a7587c3fe7
[CELEBORN-337] Remove unnecessary StatusCode.message ( #1272 )
...
* [CELEBORN-337] Remove unnecessary StatusCode.message
2023-02-24 15:11:07 +08:00
Angerszhuuuu
81f7ffd767
[CELEBORN-332] Unify the log of ShuffleClientImpl ( #1267 )
...
* [CELEBORN-332] Unify the log of ShuffleClientImpl
2023-02-24 14:07:25 +08:00
Angerszhuuuu
3067efcfd3
[CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too ( #1266 )
...
* [CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too
2023-02-23 14:57:11 +08:00
Angerszhuuuu
f7948190cf
[CELEBORN-316][FOLLOWUP] Should not wrap CelebornIOException with CelebornIOException ( #1264 )
2023-02-23 11:48:46 +08:00
Angerszhuuuu
1132cc25ab
[CELEBORN-328][MPROVEMENT] Too much noisy log when reserve slot failed ( #1262 )
2023-02-22 17:19:52 +08:00
Angerszhuuuu
322f0d2b41
[CELEBORN-316] Wrap Celeborn exception with CelebornIOException ( #1253 )
2023-02-22 16:10:11 +08:00
Shuang
3da615972e
[CELEBORN-326)] [Flink] lifecycleManager supports flink-yarn-session mode to handle multiple Flink jobs. ( #1260 )
2023-02-22 15:37:24 +08:00
Angerszhuuuu
251b923b5b
[CELEBORN-321] When register shuffle failed, DataPushQueue should directly take the task queue to avoid NPE ( #1258 )
2023-02-21 17:02:37 +08:00
Shuang
61065230bd
[CELEBORN-311] not retry when register for map partition occurs exception ( #1246 )
2023-02-21 10:16:10 +08:00
Ethan Feng
bfb39632d9
[CELEBORN-235] Implement flink plugin. ( #1244 )
2023-02-17 19:31:12 +08:00
zhongqiangchen
b5dc106af8
[CELEBORN-291] optimize shuffleclientimpl creating client and pushdata for mappartition ( #1224 )
2023-02-17 19:07:19 +08:00
Shuang
b7ef9cf216
[CELEBORN-297] don't cache file groups for map partition shuffle type ( #1237 )
2023-02-17 11:28:47 +08:00
Angerszhuuuu
57f775a7e9
[CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence ( #1208 )
2023-02-16 18:27:37 +08:00
jiaoqingbo
318157e3e9
[CELEBORN-305] Change the parameter passed in the registerShuffle method to numPartitions instead of numMappers ( #1240 )
2023-02-15 17:35:43 +08:00
jiaoqingbo
bd9e0ddc1f
[CELEBORN-304] Missing setIfMissing celeborn.$module.io.serverThreads ( #1238 )
2023-02-15 15:49:08 +08:00
Shuang
75c83093f2
[CELEBORN-296] fix map partition commit using wrong partitionId and result ( #1233 )
2023-02-14 20:54:06 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed ( #1167 )
2023-02-07 14:24:30 +08:00
Angerszhuuuu
ff683ffc91
[CELEBORN-238][IMPROVEMENT] Revive caused by PUSH_DATA_TIMEOUT_MASTER and PUSH_DATA_TIMEOUT_SLAVE should add corresponding worker into blacklist ( #1180 )
2023-02-03 17:47:24 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too ( #1185 )
2023-02-03 11:53:15 +08:00
Rex(Hui) An
021004714b
[CELEBORN-264] InFlight requests should not be expired if it's not pushed yet ( #1196 )
2023-02-01 22:16:55 +08:00
Shuang
7162be2fae
[CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker ( #1149 )
2023-01-31 18:53:36 +08:00
Angerszhuuuu
1311fb53d1
[CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should have their own ERROR type ( #1181 )
...
* [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type
2023-01-30 17:47:22 +08:00
Angerszhuuuu
8611a64400
[CELEBORN-237][IMPROVEMENT] push failed error message should show partition info ( #1178 )
...
* [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info
2023-01-28 18:41:54 +08:00
Keyong Zhou
e47f1e33b0
[CELEBORN-55][FOLLOWUP] Code refine ( #1175 )
2023-01-20 16:22:47 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker ( #1102 )
2023-01-20 10:18:45 +08:00
zhongqiangczq
1836fe187b
[CELEBORN-197] in mappartition, check transportClient whether changed while sending messages ( #1145 )
2023-01-13 16:45:26 +08:00
Shuang
810a8d01e0
[CELEBORN-212] refresh client if current client is inactive. ( #1159 )
2023-01-11 11:54:50 +08:00
Shuang
1332362bff
[CELEBORN-213] Add configuration for whether to close idle connections in client side ( #1157 )
2023-01-10 19:13:33 +08:00
Angerszhuuuu
e155ec122a
[CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice ( #1136 )
2023-01-10 11:39:40 +08:00
Shuang
2ec06472fe
[CELEBORN-203] fix NPE when removeExpiredShuffle in LifecycleManager. ( #1151 )
2023-01-06 18:32:17 +08:00
Angerszhuuuu
0d5809ff0c
[CELEBORN-192][IMPROVEMENT] Change FAILED status to REQUEST_FAILED since it's all used when RPC request failed. ( #1139 )
2023-01-06 16:53:04 +08:00
Shuang
3b2be25a50
[CELEBORN-173] refactor minicluster and fix ut ( #1147 )
2023-01-05 20:39:19 +08:00
Angerszhuuuu
415452d9c4
[CELEBORN-189][IMPROVEMENT] PushDataFailedSlave should add slave worker to blacklist ( #1135 )
2023-01-05 20:12:07 +08:00
Angerszhuuuu
fe8dfb05f3
[CELEBORN-196][REFACTOR] Rename batchHandleRequestPartitions to handleRequestPartitions ( #1144 )
2023-01-05 14:37:10 +08:00
Angerszhuuuu
2315f2f988
[CELEBORN-191][BUG] ShuffleClient registerShuffle return RESERVE_SLOTS_FAILED should also been print out ( #1138 )
2023-01-03 17:13:31 +08:00
Shuang
5cba307189
[CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup ( #1116 )
2022-12-30 18:15:23 +08:00
Cheng Pan
b8758a7cb6
[CELEBORN-181][TEST] Rename RssFunSuite to CelebornFunSuite ( #1125 )
2022-12-29 18:10:14 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start ( #1119 )
2022-12-29 12:07:20 +08:00
Binjie Yang
63943cd5cc
[CELEBORN-147][IT]Extraction of common integration test cases ( #1092 )
2022-12-29 12:03:09 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata ( #1097 )
2022-12-20 20:40:42 +08:00
Keyong Zhou
a2dd72f20c
[CELEBORN-155] Wrong TimeUnit for registerShuffleRetryWait in Shuffle… ( #1099 )
2022-12-19 17:32:18 +08:00
Shuang
13769f0f0a
[CELEBORN-121] Refactor batchHandleCommitPartition ( #1089 )
2022-12-19 12:39:39 +08:00
Ethan Feng
39394526a8
[CELEBORN-142]Keep committed partition locations semantic consistent when commit files on HDFS. ( #1091 )
2022-12-16 19:02:02 +08:00
nafiy
ddab27a1d7
[CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse ( #1093 )
...
* [CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse
2022-12-15 18:16:34 +08:00
Ethan Feng
65cb36c002
[CELEBORN-83][FOLLOWUP] Fix various bugs when using HDFS as storage. ( #1065 )
2022-12-15 15:20:29 +08:00
Shuang
e3576e4e7a
[CELEBORN-117] refactor CommitManager, implements M/R Partition Commi… ( #1060 )
2022-12-15 11:09:59 +08:00
Cheng Pan
ec371c0026
[CELEBORN-132] ShuffleClient should not implement Cloneable ( #1077 )
2022-12-14 10:04:39 +08:00
Angerszhuuuu
c924a4ff0d
[CELEBORN-61][CELEBORN-62][FEATURE] Shuffle client support slow start, congestion avoidance and congestion control ( #1052 )
2022-12-08 12:41:34 +08:00
zhongqiangczq
60f6f87832
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write:pushdata ( #1036 )
2022-12-08 12:31:47 +08:00
zhongqiangczq
d3d40f730c
[CELEBORN-106] flink-plugin supports shufflewrite:OutputGate ( #1051 )
2022-12-08 11:24:37 +08:00
Shuang
e2196e9383
[CELEBORN-56] [ISSUE-945] handle map partition mapper end ( #1003 )
2022-12-07 21:09:02 +08:00
Shuang
f3f104870c
[CELEBORN-75] Initialize flink plugin module ( #1027 )
2022-12-07 15:53:00 +08:00
Angerszhuuuu
0d38bad78a
[CELEBORN-20][REFACTOR] Extract CommitManager from LifecycleManager ( #1050 )
2022-12-06 22:26:18 +08:00
Angerszhuuuu
1e4dec96b9
[CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager ( #1024 )
...
* [CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager
2022-12-05 17:05:17 +08:00
Angerszhuuuu
5eaad136a0
[CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason ( #1043 )
...
* [CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason
2022-12-05 14:02:33 +08:00
nafiy
8e384cda5a
[CELEBORN-88][REFACTOR] Revive/PartitionSplit should set separated timeout configuration ( #1046 )
2022-12-05 10:36:43 +08:00
nafiy
44d45c2a27
[CELEBORN-90][REFACTOR] GetReducerFileGroup should support separated timeout configuration ( #1045 )
2022-12-02 22:53:51 +08:00
Shuang
3a4c3c03a0
[CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem ( #1034 )
...
* [CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem
2022-12-02 16:25:59 +08:00
nafiy
13e1e24035
[CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration ( #1031 )
...
* [CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration
2022-12-01 18:39:56 +08:00
zhongqiangczq
898d1126a6
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write: send handshake/regionstart/regionfinish ( #1035 )
2022-12-01 11:20:55 +08:00
RexAn
bb5a4d2180
[CELEBORN-63] Add CONGESTION related status codes ( #1028 )
...
* Increase push data return reason types such as CONGESTION ect
2022-12-01 10:55:37 +08:00
Angerszhuuuu
7f8e66afbc
CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end ( #1030 )
...
* CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end
2022-11-30 19:42:04 +08:00
Ethan Feng
dd02070e4b
[CELEBORN-83] Fix various bug when using HDFS as storage.
...
1. fix incompatibility between Hadoop 2 and Hadoop 3.
2. fix hdfs writer will never be called when there are no healthy disks.
3. fix an NPE when HDFS file writer close.
2022-11-30 19:33:18 +08:00
Angerszhuuuu
5ad4415c68
[CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager ( #1021 )
...
* [CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager
2022-11-29 19:14:55 +08:00
Angerszhuuuu
01dc9d4259
[CELEBORN-79][REFACTOR] Remove unused responseCheckerThread from LifecycleManager ( #1022 )
2022-11-29 15:25:37 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end
2022-11-29 13:09:01 +08:00
Angerszhuuuu
13f4ce2be6
[CELEBORN-68][FOLLOWUP] Retry on same partition location should have a retry wait interval ( #1017 )
2022-11-28 20:17:08 +08:00
Keyong Zhou
d381df71f8
[CELEBORN-70] Add epoch for each commitFiles request ( #1012 )
2022-11-27 21:05:14 +08:00
nafiy
817eee969f
[CELEBORN-58][REFACTOR] Aggregate reserve failed logs together ( #1005 )
2022-11-26 20:56:39 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request ( #1011 )
2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk ( #1010 )
2022-11-26 18:06:06 +08:00
Ethan Feng
93dbf3f8b1
[CELEBORN-67] Revert "Fix fetch incorrect data chunk" related commits ( #1006 )
...
* Revert "[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999 )"
This reverts commit 1e8f6dc5e8 .
* Revert "[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000 )"
This reverts commit f1c4d675d6 .
* Revert "[CELEBORN-49] Deadlock when kill worker in shuffle read (#998 )"
This reverts commit 0be4b3399c .
* Revert "[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995 )"
This reverts commit 2b05228871 .
* Revert "[BUG] Fix fetch incorrect data chunk (#926 )"
This reverts commit 6f043f8a
* Revert "[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954 )"
This reverts commit 64e8ebf1
2022-11-25 20:57:47 +08:00
nafiy
fe13e9e261
[CELEBORN-59][REFACTOR] Support send destroy slots request in parallel ( #1004 )
2022-11-25 18:26:05 +08:00
Angerszhuuuu
1e8f6dc5e8
[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data ( #999 )
...
* [CELEBORN-48][BUG] Channel inactive may cause new client use old stream id to fetch data
2022-11-23 18:22:06 +08:00
Ethan Feng
f1c4d675d6
[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. ( #1000 )
2022-11-23 18:07:57 +08:00
Keyong Zhou
0be4b3399c
[CELEBORN-49] Deadlock when kill worker in shuffle read ( #998 )
2022-11-23 17:31:05 +08:00
Angerszhuuuu
2b05228871
[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk ( #995 )
2022-11-23 11:56:10 +08:00
Keyong Zhou
cfc1fa15bd
[CELEBORN-46] Refine log for RssInputStream.close() ( #994 )
2022-11-22 22:01:08 +08:00
Shuang
1656458788
[CELEBORN-14] [ISSUE-955] support register attempt map task ( #984 )
2022-11-22 15:23:20 +08:00
Angerszhuuuu
5ec278f99a
[ISSUE-987][FEATURE] During worker shutdown, return HARD_SPLIT for all existed partition ( #988 )
2022-11-22 14:29:55 +08:00
Shuang
fb6d1de108
[CELEBORN-8] [ISSUE-952][FEATURE] support register shuffle task in map partition mode ( #973 )
2022-11-16 21:46:19 +08:00
Angerszhuuuu
64e8ebf158
[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback ( #954 )
2022-11-11 14:00:47 +08:00
leesf
0b8376e2c7
Cleanup some code ( #943 )
2022-11-11 13:58:39 +08:00
Ethan Feng
6f043f8ae9
[BUG] Fix fetch incorrect data chunk ( #926 )
2022-11-09 22:31:39 +08:00
leesf
3699683a3b
Fix and migrate some configs ( #927 )
2022-11-07 09:41:38 +08:00
Angerszhuuuu
38e15d89e6
[ISSUE-902][IMPROVEMENT][FOLLOWUP] LifecycleManager should reserve blacklist with irrecoverable status ( #914 )
2022-11-04 15:54:45 +08:00
Angerszhuuuu
e68ca75a9e
[ISSUE-902][BUG] LifecycleManager should not reallocate slots in failed worker during retry ( #906 )
2022-11-02 21:07:28 +08:00
leesf
f1694f3d20
[MINOR][CLEANUP] clean up some code in LifecycleManager and ShuffleClientImpl ( #896 )
2022-11-01 11:40:19 +08:00
Angerszhuuuu
87fcfa767f
[ISSUE-887][REFACTOR] Configuration type convert to Enum ( #888 )
...
* [ISSUE-332][FOLLOWUP] Add deps in worker's pom
* [Refactor] Modify package name of utils to keep consistence
* [Refactor] Modify package name of utils to keep consistence
* [REFACTOR] Remove unused isRegistered in controller
* [ISSUE-887][REFACTOR] Configuration type convert to Enum
* update
* update
* Update RssShuffleManager.java
2022-10-29 13:41:06 +08:00
Cheng Pan
d7be6006e7
Migrate network related conf to structured conf system ( #875 )
...
* Migrate network related conf to structured conf system
* migrate
* fix
* fix
* worker
* fix
* nit
* review
* nit
2022-10-28 10:45:52 +08:00
Angerszhuuuu
f9ecde3b2b
[ISSUE-863][BUG]LifecycleManager should ignore change partition request when shuffle ended and not remove workersnapshot when commit success ( #864 )
2022-10-27 22:04:18 +08:00
Ethan Feng
8800fc4a8e
[Refactor] Refine rpc cache configs ( #853 )
...
* refine rpc cache configs.
* update.
* update.
* update.
2022-10-25 20:28:18 +08:00
Ethan Feng
45ef716737
[Feature] Cache GetReducerFileGroupResponse to avoid lifecycle manager oom. ( #792 )
2022-10-25 16:16:44 +08:00
AngersZhuuuu
2ebf873b3c
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System ( #846 )
...
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System
2022-10-25 10:54:45 +08:00
AngersZhuuuu
0bd0a3e9f4
[ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System ( #848 )
...
* [ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System
* Update CelebornConf.scala
* follow comments
* update
* update
* update
* Update client.md
2022-10-25 09:16:46 +08:00
AngersZhuuuu
0fdb19065a
[ISSUE-841][REFACTOR] Migrate shuffle client side conf to Celeborn Configuration System ( #842 )
2022-10-24 20:48:48 +08:00
Keyong Zhou
63752e7a37
[BUG] RegisterShuffle should not increase epoch ( #833 )
2022-10-23 23:40:32 +08:00
nafiy
d0058fb2c5
[ISSUE-780][REFACTOR] Refactor PartitionLocation's methods ( #791 )
2022-10-22 22:46:45 +08:00
AngersZhuuuu
f2610e3b6f
[ISSUE-829][REFACTOR] Unify name of PUSH_DATA_FAIL_MAIN ( #830 )
2022-10-21 19:06:33 +08:00
AngersZhuuuu
a773c8e6db
[ISSUE-820][Refactor] Rename RssConf to CelebornConf ( #826 )
2022-10-20 20:13:13 +08:00
AngersZhuuuu
8344479df1
[ISSUE-818][REFACTOR] Move existing RssConf.xxx conf method to RssConf class ( #822 )
...
* [ISSUE-818][REFACTOR] Move existing RssConf.xxx conf method to RssConf class
Co-authored-by: Ethan Feng <ethan.aquarius.fmx@gmail.com>
2022-10-20 18:10:59 +08:00
Ethan Feng
5c761a8df3
[ISSUE-813][Refactor] Refactor flusher configurations. ( #813 )
...
* Refactor flusher configurations.
* Refactor flusher configurations.
* Update.
* remove brackets.
* update docs.
* rename.
* update.
* update docs.
* update.
* update.
* update.
* update.
* update.
* update.
* update.
* format.
* update.
* update.
2022-10-20 15:23:17 +08:00
nafiy
a75bce905e
[ISSUE-805][REFACTOR] Remove UserIdentifier out of ControlMessage ( #808 )
2022-10-19 15:32:53 +08:00
AngersZhuuuu
7fedaaeca1
[ISSUE-795][BUG] Batch handle change partition throw NPE ( #796 )
2022-10-19 10:54:08 +08:00
Ethan Feng
bff2a7065b
Keep one copy of roaringbitmap to reduce memory usage. ( #790 )
2022-10-18 13:26:49 +08:00
Cheng Pan
efad4abb5d
Migrate a bunch of configurations ( #786 )
2022-10-18 10:44:01 +08:00
Cheng Pan
ea67f4e060
Introduce categories to ConfigEntry and migrate configurations ( #775 )
2022-10-17 16:56:54 +08:00
Cheng Pan
96e969f46e
[BUILD] Extract project.version to Maven Property ( #772 )
2022-10-16 19:01:40 +08:00
AngersZhuuuu
c9b462dc02
[ISSUE-770][Refactor] Batch handle change partition should ignore empty batch and avoid print log of empty result ( #771 )
2022-10-14 21:49:37 +08:00
AngersZhuuuu
3bad403c8b
[ISSUE-768][REFACTOR] Shuffle data lost should show more clear about lost data in which worker ( #769 )
2022-10-14 11:41:15 +08:00
Cheng Pan
f01a696313
Migrate and refactor configuration for master endpoints ( #752 )
2022-10-11 21:33:21 +08:00
AngersZhuuuu
bbb4f8e225
[ISSUE-306][IMPROVEMENT] Handle change partition request in batch ( #622 )
2022-10-10 18:31:37 +08:00
AngersZhuuuu
f2a234f870
[ISSUE-739][REFACTOR] Use object wrap pb message method ( #740 )
2022-10-09 11:53:48 +08:00
AngersZhuuuu
ae4bb12d5e
[ISSUE-630][REFACTOR] Minor change of storage resource quota, include code style, comment unused code etc.. ( #728 )
2022-10-08 20:15:25 +08:00
Ethan Feng
96e550f81c
Fix a npe that stuck lifecycle manager when a worker is offline. ( #733 )
2022-10-08 20:11:42 +08:00
Ethan Feng
6deda248ac
[REFACTOR]move lifecycle manager to correct package. ( #730 )
2022-10-08 18:14:08 +08:00
Cheng Pan
ab16b4f101
[INFRA] Rename modules w/ celeborn prefix ( #723 )
2022-10-08 08:05:57 +08:00
Cheng Pan
abb4ce6405
Drop control message Scala wrapper - Revive/PartitionSplit/ChangeLocationResponse ( #720 )
2022-10-07 12:40:23 +08:00
Cheng Pan
a719709a17
Drop control message Scala wrapper - UnregisterShuffle/UnregisterShuffleResponse ( #718 )
2022-10-07 12:29:10 +08:00
Cheng Pan
cda133e11f
Drop control message Scala wrapper - RegisterShuffle/RegisterShuffleResponse ( #716 )
2022-10-06 23:37:36 +08:00
Keyong Zhou
a2d2379153
[DOC] Replace RSS with Celeborn in docs ( #715 )
2022-10-06 10:37:46 +08:00
Cheng Pan
4880d78d6a
Extract spark tests and improve pom ( #711 )
2022-10-04 10:23:26 +08:00
Keyong Zhou
fe3b5988f2
[REFACTOR] Change package name to org.apache.celeborn ( #710 )
2022-10-02 18:10:29 +08:00
nafiy
5d4533fb85
[ISSUE-632][FEATURE] LifecycleManager side ReserveSlots & RequestSlots RPC with UserIdentifier ( #679 )
2022-09-27 00:01:44 +08:00
zky.zhoukeyong
a2522745d2
Revert "Drop control message Scala wrapper - RemoveExpiredShuffle ( #676 )"
...
This reverts commit a160cd90cb .
2022-09-25 17:18:41 +08:00
Cheng Pan
a160cd90cb
Drop control message Scala wrapper - RemoveExpiredShuffle ( #676 )
2022-09-24 23:23:36 +08:00
Ethan Feng
30d4323cdb
[FEATURE] Add a configuration to enable a map id filter mechanism. #662 ( #663 )
2022-09-23 18:38:52 +08:00
Ethan Feng
4a7a7d42b5
[FEATURE] Add metrics about fetch chunk size, commit files time and get reducer file time ( #661 )
2022-09-23 16:05:28 +08:00
Ethan Feng
b4654d788c
[ISSUE-607]Add map ids info for each PartitionLocation to enable filtering for m… ( #619 )
2022-09-23 15:21:41 +08:00
AngersZhuuuu
a6b8af2b00
[ISSUE-637][FEATURE] Change CheckAlive to CheckAvailable and reply checkQuota result ( #658 )
2022-09-22 21:54:45 +08:00
AngersZhuuuu
df5ba55ea5
[ISSUE-633][FEATURE] Support provider user identity by customized class and keep LifecycleManager and ShuffleClient user identity consistence ( #646 )
2022-09-21 17:35:59 +08:00
Ethan Feng
3c917c577b
Fix worker replied ack at the wrong time when a soft split is triggered. ( #645 )
2022-09-21 15:07:21 +08:00
Cheng Pan
b51abeed96
Improve code smell ( #624 )
2022-09-20 10:03:02 +08:00
Keyong Zhou
30a5afb816
[ISSUE-625][BUG] Incorrect result when kill worker while pushMergedData ( #627 )
2022-09-20 00:05:15 +08:00
AngersZhuuuu
e48efb2e1c
[ISSUE-611][BUG] FetchHandler should handle PartitionFileSorter return null and we should enable retry for sorter exception ( #615 )
2022-09-19 14:51:46 +08:00
nafiy
75ca396e77
[ISSUE-600][Refactor] Translate Chinese comments to English ( #605 )
2022-09-15 22:24:39 +08:00
Keyong Zhou
0dc7e82006
improve revive log readability. ( #603 ) ( #604 )
...
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
2022-09-14 23:25:49 +08:00
AngersZhuuuu
a6acaa11e0
[ISSUE-597][REFACTOR] Unify Enum type name and correct wrong UN_KOWN ( #598 )
2022-09-13 19:07:48 +08:00
nafiy
01d138bea4
[ISSUE-578][FEATURE] Add unit test for codec ( #586 )
2022-09-11 17:08:45 +08:00
Keyong Zhou
e0c4779fac
[ISSUE-591][BUG] Incorrect result when revive and split happen concur… ( #592 )
2022-09-10 23:30:39 +08:00
Keyong Zhou
1d7fec84da
[ISSUE-588][BUG] Fix memory leak in shuffle read ( #589 )
2022-09-10 22:07:13 +08:00
nafiy
0a60b21b56
[ISSUE-551][BUG] CompressionMethod and checksum are not consistent when zstd level is negative ( #577 )
2022-09-10 21:39:51 +08:00
Keyong Zhou
a2cd01b8ef
[ISSUE-567][FOLLOW-UP] remove entry from latestPartitionLocation in removeExpiredShuffle ( #575 )
2022-09-08 11:21:42 +08:00
AngersZhuuuu
da7ac1721b
[ISSUE-565][REFACTOR] Unify RPC name HeartbeatXxxxx ( #566 )
2022-09-07 21:33:18 +08:00
Keyong Zhou
f0b6346c9f
[ISSUE-567] Optimize LifecycleManager.getLatestPartition ( #570 )
2022-09-07 21:06:49 +08:00
nafiy
644471debb
[ISSUE-516][FEATURE] Worker should clean remaining directory when start before registering to Master ( #540 )
2022-09-06 23:37:47 +08:00
AngersZhuuuu
35d5b587ec
[Refactor] Modify package name of utils to keep consistence ( #536 )
2022-09-05 20:06:54 +08:00
AngersZhuuuu
f7211204f2
[ISSUE-534][REFACTOR] Refactor log when call handleGetReducerFileGroup ( #535 )
2022-09-05 19:48:57 +08:00
Cheng Pan
4b42219595
Remove log4j1 ( #501 )
2022-09-05 19:30:15 +08:00