Commit Graph

1099 Commits

Author SHA1 Message Date
Ethan Feng
44a665e27e
fix empty shuffle read caused by fallback. (#151) 2022-06-14 16:08:10 +08:00
Ethan Feng
6811cc22fc
[issue-146] Add storage hint to indicate storage location. (#147) 2022-06-14 15:57:11 +08:00
AngersZhuuuu
b51a7626b2
[ISSUE-148][BUG] MapEnd but speculation task's inFlightBatch not cleaned (#149) 2022-06-13 15:44:06 +08:00
Keyong Zhou
49f1ee6088
Reuse SendBuffer among tasks in Executor (#131) 2022-06-03 11:26:30 +08:00
Ethan Feng
1113f437c6
[FEATURE] Remove dependency on spark-tags from common module (#126) (#128) 2022-05-31 15:24:08 +08:00
nafiy
491f89bbb5
[FEATURE]Add metrics source for JVM and CPU (#125)
* Add metrics source for JVM and CPU

* Fix scala style issue
2022-05-30 13:26:54 +08:00
AngersZhuuuu
730d0c4a97
[ISSUE-120] [BUG] Master‘s metrics of WorkerSlotsCount / WorkerSlotsUsed/ OverloadWorkerCount not update (#121)
[ISSUE-120] [BUG] Master‘s metrics of WorkerSlotsCount / WorkerSlotsUsed/ OverloadWorkerCount not update
2022-05-23 19:19:24 +08:00
Ethan Feng
86adc0d244
[Feature]Add metrics documentation and grafana dashboard. (#117) 2022-05-20 12:12:41 +08:00
Ethan Feng
ac645a464b
update netty and ratis version. (#115) 2022-05-19 11:25:55 +08:00
Ethan Feng
7d04dbab92
[BUG]Fix a null pointer exception. (#116)
* 1.Fix a null pointer exception.
2.Add partitionlocation to inflight batches to help resolve problems.
3.Reduce driver logs.
2022-05-19 11:23:34 +08:00
leoyy0316
f79e40b21d
modify CONTRIBUTING.md and move LifecycleManager to scala source (#112)
Leo Cheng <leocheng@synnex.com>
2022-05-16 19:03:40 +08:00
Ethan Feng
e8e333a239
RSS support spark3 RDA. (#108)
* RSS support spark3 RDA.
2022-05-14 14:02:40 +08:00
AngersZhuuuu
d4222fb632
[ISSUE-110] ShuffleWriteTime should include pushMergeData (#111) 2022-05-13 20:58:34 +08:00
The Gitter Badger
a6e37f1731
Add Gitter badge (#105) 2022-05-13 20:57:25 +08:00
Ethan Feng
69e52d53a4
[FEATURE] deactivate all profiles by default. (#100) 2022-04-19 20:28:40 +08:00
Ethan Feng
42d0dfc51e
shade netty package. (#98) 2022-04-18 17:14:23 +08:00
Ethan Feng
3019e2712b
[bug]fix parameter position error. (#96) 2022-04-15 16:41:21 +08:00
Ethan Feng
bc0b5fca42
correct README.md (#95) 2022-04-15 14:29:38 +08:00
Ethan Feng
409da82964
[Bug]fix stuck under high memory pressure. (#90) 2022-04-14 18:53:39 +08:00
Ethan Feng
2ea136fada
[Feature]Update spark2 patch (#89) 2022-04-08 21:46:30 +08:00
Ethan Feng
db84bce438
Merge pull request #91 from FMX/optimize-ut-time
[FEATURE] Optimize unit test running time
2022-04-08 17:22:50 +08:00
mingji
8ac0167b69 optimize unit test running time. 2022-04-08 15:51:33 +08:00
mingji
a7449a9821 update netty version. 2022-04-08 10:57:53 +08:00
Ethan Feng
baa2836216
Add metrics: (#85)
1.shuffle fetch send data time.
 2.open stream time.
 3.memory critical count.
2022-04-02 15:05:27 +08:00
Ethan Feng
9ad8254b0a
AQE support. (#67) 2022-04-01 20:19:01 +08:00
AngersZhuuuu
86bbeea9b4
[BUG] Register shuffle with configurable retry times and retry wait time (#83) 2022-04-01 16:59:37 +08:00
AngersZhuuuu
4bd3a539a5
[ISSUE-80] When rss is in blacklist and failed for reserve, rpcRef could be null (#81) 2022-03-29 21:12:37 +08:00
AngersZhuuuu
eacb9a1217
Refactor the configuration (#72) 2022-03-11 12:03:43 +08:00
Ethan Feng
254372c418
Merge pull request #69 from lichaojacobs/lc_fix_tcp_nodelay
minor fix for useless tcp_nodelay flag
2022-03-10 15:56:11 +08:00
wangshengjie123
b2a6091b55
[Feature] Make log4j2 as optional in case to we can update log4j2.xml to change log level (#56) 2022-03-08 22:33:06 +08:00
lichao
44cc9d0294 minor fix for #68 2022-03-08 21:56:02 +08:00
Ethan Feng
1e62a1807d
Merge pull request #64 from AngersZhuuuu/ISSUE-63
Fix Issue 63
2022-03-07 09:52:44 +08:00
Ethan Feng
38df851940
[bug]fix a npe error. (#65) 2022-03-06 16:24:49 +08:00
Angerszhuuuu
ba632cadfd Fix Issue 63 2022-03-03 15:33:24 +08:00
Ethan Feng
780918927c
[BUG]fix error: incorrect collection type. (#62) 2022-03-02 16:22:21 +08:00
Keyong Zhou
4f66849d6a
fix NPE in LifecycleManager.handleGetBlacklist (#59) 2022-02-16 12:17:41 +08:00
wangshengjie123
710e4c2c0b
[BUG] Fix Rpc error: worker cannot send heartbeat to master (#54) 2022-02-15 17:47:29 +08:00
Ethan Feng
1db6f3f68f
Add instructions for JDK. (#48) 2022-01-29 22:47:55 +08:00
Ethan Feng
356a1952e4
Multi Client Support (#47)
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2022-01-29 22:28:06 +08:00
wangshengjie123
65d52801c4
[Feature] Split ha and non-ha config and update doc (#43) 2022-01-27 21:42:55 +08:00
Ethan Feng
1d7b59c80a
fix import order. (#49) 2022-01-27 21:23:25 +08:00
Ethan Feng
bc1adac90e
[FEATURE]Worker-Wise Current-Limiting (#44) 2022-01-26 15:27:00 +08:00
Tony Doen
302891a1b9
[BUG] ClusterLoadFallbackPolicy is not strictness when a shuffle with big partitions to register (#30) 2022-01-26 15:16:01 +08:00
Keyong Zhou
040ce00aac
[FEATURE] Support Spark3.0 (#42) 2022-01-18 17:31:42 +08:00
Keyong Zhou
0f27a53c27
fix compile error with spark3.2.0 (#39) 2022-01-17 15:27:57 +08:00
Keyong Zhou
31dc2cf7da
[BUG] Record failed worker in LifecycleManager instead of reporting to Master (#34) 2022-01-07 12:18:56 +08:00
wangshengjie123
70d770017e
[BUG] Correct the pom for shuffle-manager-2 for include com.aliyun.emr:client (#25) 2021-12-29 10:32:43 +08:00
zky.zhoukeyong
ba5920acde Initial Commit for RSS 2021-12-28 20:57:35 +08:00
Alibaba OSS
0d29f88ada
Initial commit 2021-12-10 16:57:16 +08:00