Commit Graph

42 Commits

Author SHA1 Message Date
Ethan Feng
f3bcb7f6a8
[ISSUE-146]update slots distribution mechanism (#273) 2022-08-12 23:38:19 +08:00
Keyong Zhou
d166e042be
[ISSUE-329] Should not sleep if reserve slots successfully in reserveSlotsWithRetry (#330) 2022-08-12 12:27:27 +08:00
AngersZhuuuu
cf2b895afb
[ISSUE-293][REFACTOR] Init worker rpc endpoint and reserve slot in parallel to speed up register shuffle process (#294)
[ISSUE-293][REFACTOR] Init worker rpc endpoint and reserve slot in parallel to speed up register shuffle process (#294)
2022-08-03 20:00:30 +08:00
AngersZhuuuu
e57ad27887
[ISSUE-291][REFACTOR] When worker endpoint initializing failed, print clear warning log (#292) 2022-08-02 12:03:59 +08:00
dxheming
8e3f48ec12
Refactor deprecated netty ConcurrentSet (#285) 2022-07-27 20:35:46 +08:00
AngersZhuuuu
7a760466aa
[ISSUE-281][BUG] Use correct maxDestLength to check if buffer can satisfy compress result (#282) 2022-07-26 15:56:05 +08:00
AngersZhuuuu
9324b1e89a
[ISSUE-257][FEATURE] Reserve slots support customized retry times (#258) 2022-07-26 15:23:25 +08:00
AngersZhuuuu
fe17914942
Refactor pom import issue (#277) 2022-07-25 17:49:55 +08:00
Keyong Zhou
6442f38a33
[ISSUE-267] Extend API to support more partition types: MapPartition,… (#268) 2022-07-17 16:28:37 +08:00
Keyong Zhou
56a0b9072b
[ISSUE-261] Refine message class hierarchy (#266) 2022-07-16 17:00:09 +08:00
Keyong Zhou
7da8f64691
[ISSUE-262] Remove unused bootstrap (#263) 2022-07-16 11:01:44 +08:00
AngersZhuuuu
36cc234dd4
[ISSUE-246][REFACTOR] Refactor LifecycleManager to make it's code more clear and more readable (#252) 2022-07-12 15:37:49 +08:00
Keyong Zhou
691beb7889
[ISSUE-247] Extract PushHandler, FetchHandler, RpcHandler from Worker… (#251) 2022-07-12 11:40:42 +08:00
Keyong Zhou
d8c5758124
[ISSUE-249] Fix OutOfBounds when shuffle has no data(q24b) (#250) 2022-07-10 18:03:54 +08:00
AngersZhuuuu
f80c86a675
[ISSUE-222] Destroy and DestroyResponse should remove null check (#238) 2022-07-09 15:44:17 +08:00
AngersZhuuuu
49caced462
[ISSUE-222][BUG] GetReduceFileGroups should remove code about return null value (#236) 2022-07-09 12:14:08 +08:00
AngersZhuuuu
c28eeb078c
[ISSUE-222] CommitFiles and CommitFilesResponse should remove null check (#237) 2022-07-08 22:32:54 +08:00
AngersZhuuuu
6e5c282229
[ISSUE-222] GetBlacklist/GetBlacklistResponse should replace null value with empty list (#235) 2022-07-08 14:49:09 +08:00
AngersZhuuuu
d2a0ad480e
[ISSUE-222][BUG] RequestSlotResponse/RegisterShuffleResponse should handle null issue (#226) 2022-07-08 12:33:40 +08:00
AngersZhuuuu
736a3e8814
[ISSUE-222][BUG] handleChangePartitionLocation should handle oldPartition == null (#224) 2022-07-07 22:48:19 +08:00
Ethan Feng
04148fef2b
[ISSUE-228]Fix unexpected closed exceptions occurred while committing files. (#232) 2022-07-07 22:15:16 +08:00
Keyong Zhou
49f2a00943
[ISSUE-208] Refine log levels (#210) 2022-07-01 14:57:30 +08:00
AngersZhuuuu
506cc0af9c
[ISSUE-171][BUG] LifeCycleManager throw cala.collection.immutable.HashMap$HashTrieMap cannot be cast to java.util.HashMap when handle destroyBuffersWithRetry (#172)
* [ISSUE-171][BUG] LifeCycleManager throw cala.collection.immutable.HashMap$HashTrieMap cannot be cast to java.util.HashMap when handle destroyBuffersWithRetry
2022-06-28 10:45:16 +08:00
AngersZhuuuu
5c82b763eb
[ISSUE-169][FEATURE] Make app heartbeat interval can be customized (#170)
* [ISSUE-169][FEATURE] Make app heartbeat interval can be customized

* Update LifecycleManager.scala
2022-06-27 20:58:00 +08:00
mingji
d4d8eb3838 update pom version. 2022-06-24 14:28:42 +08:00
AngersZhuuuu
73b41ac8c5
[ISSUE-160] [BUG] requestReserveSlot failed loss root cause (#161) 2022-06-23 16:33:41 +08:00
AngersZhuuuu
84a281ff89
[ISSUE-158][BUG] When revive meet reserve slot filed, will throw ArrayBoundOutOfIndex exception (#159)
* [ISSUE-158][BUG] When revive meet reserve slot filed, will throw ArrayBoundOutOfIndex exception

* Update pom.xml
2022-06-23 16:15:38 +08:00
AngersZhuuuu
146f724a15
ISSUE-152. Show target host:port when push data callback onFailure (#153) 2022-06-17 22:09:17 +08:00
Ethan Feng
6811cc22fc
[issue-146] Add storage hint to indicate storage location. (#147) 2022-06-14 15:57:11 +08:00
AngersZhuuuu
b51a7626b2
[ISSUE-148][BUG] MapEnd but speculation task's inFlightBatch not cleaned (#149) 2022-06-13 15:44:06 +08:00
Ethan Feng
7d04dbab92
[BUG]Fix a null pointer exception. (#116)
* 1.Fix a null pointer exception.
2.Add partitionlocation to inflight batches to help resolve problems.
3.Reduce driver logs.
2022-05-19 11:23:34 +08:00
leoyy0316
f79e40b21d
modify CONTRIBUTING.md and move LifecycleManager to scala source (#112)
Leo Cheng <leocheng@synnex.com>
2022-05-16 19:03:40 +08:00
Ethan Feng
409da82964
[Bug]fix stuck under high memory pressure. (#90) 2022-04-14 18:53:39 +08:00
Ethan Feng
9ad8254b0a
AQE support. (#67) 2022-04-01 20:19:01 +08:00
AngersZhuuuu
86bbeea9b4
[BUG] Register shuffle with configurable retry times and retry wait time (#83) 2022-04-01 16:59:37 +08:00
AngersZhuuuu
4bd3a539a5
[ISSUE-80] When rss is in blacklist and failed for reserve, rpcRef could be null (#81) 2022-03-29 21:12:37 +08:00
Keyong Zhou
4f66849d6a
fix NPE in LifecycleManager.handleGetBlacklist (#59) 2022-02-16 12:17:41 +08:00
Ethan Feng
356a1952e4
Multi Client Support (#47)
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2022-01-29 22:28:06 +08:00
Ethan Feng
bc1adac90e
[FEATURE]Worker-Wise Current-Limiting (#44) 2022-01-26 15:27:00 +08:00
Tony Doen
302891a1b9
[BUG] ClusterLoadFallbackPolicy is not strictness when a shuffle with big partitions to register (#30) 2022-01-26 15:16:01 +08:00
Keyong Zhou
31dc2cf7da
[BUG] Record failed worker in LifecycleManager instead of reporting to Master (#34) 2022-01-07 12:18:56 +08:00
zky.zhoukeyong
ba5920acde Initial Commit for RSS 2021-12-28 20:57:35 +08:00