Commit Graph

33 Commits

Author SHA1 Message Date
nafiy
96b14e2205
[ISSUE-304][BUG]HA port being occupied makes master cannot normally launch (#317)
[ISSUE-304][BUG]HA port being occupied makes master cannot normally launch
2022-08-16 20:37:01 +08:00
Cheng Pan
f1f4b894af
Build: Enhance build system (#349) 2022-08-15 14:59:01 +08:00
Keyong Zhou
937ac54e7c
[ISSUE-351] Trigger split when reaching disk space limitation (#356) 2022-08-15 00:24:25 +08:00
Keyong Zhou
c2672c2d9d
[ISSUE-273][FOLLOW-UP] 1.Heartbeat use workerInfo's diskInfos instead… (#352) 2022-08-14 16:54:08 +08:00
Keyong Zhou
20a3ba4e56
[ISSUE-273][FOLLOW-UP] Merge MountInfo with DiskInfo (#348) 2022-08-13 22:58:13 +08:00
Keyong Zhou
9516a63eb5
[ISSUE-273][FOLLOW-UP] Remove duplicate handleWorkerHeartBeat (#347) 2022-08-13 18:32:47 +08:00
Keyong Zhou
6d1a2db663
[ISSUE-273][FOLLOW-UP] Fix IndexOutOfBoundsException when release slots (#344)
```
java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at com.aliyun.emr.rss.service.deploy.master.clustermeta.AbstractMetaManager.updateReleaseSlotsMeta(AbstractMetaManager.java:104)
        at com.aliyun.emr.rss.service.deploy.master.clustermeta.SingleMasterMetaManager.handleReleaseSlots(SingleMasterMetaManager.java:53)
        at com.aliyun.emr.rss.service.deploy.master.Master.handleReleaseSlots(Master.scala:456)
        at com.aliyun.emr.rss.service.deploy.master.Master$$anonfun$receiveAndReply$1.$anonfun$applyOrElse$12(Master.scala:189)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at com.aliyun.emr.rss.service.deploy.master.Master.executeWithLeaderChecker(Master.scala:156)
        at com.aliyun.emr.rss.service.deploy.master.Master$$anonfun$receiveAndReply$1.applyOrElse(Master.scala:189)
        at com.aliyun.emr.rss.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:110)
        at com.aliyun.emr.rss.common.rpc.netty.Inbox.safelyCall(Inbox.scala:214)
        at com.aliyun.emr.rss.common.rpc.netty.Inbox.process(Inbox.scala:107)
        at com.aliyun.emr.rss.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:222)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
```
2022-08-13 12:44:03 +08:00
Ethan Feng
f3bcb7f6a8
[ISSUE-146]update slots distribution mechanism (#273) 2022-08-12 23:38:19 +08:00
nafiy
eeda030599
Add metrics for marking active master (#307) 2022-08-07 18:00:49 +08:00
Cheng Pan
d01ee81ee6
Bump Ratis 2.3.0 and related toolchains (#299) 2022-08-04 21:59:42 +08:00
dxheming
8e3f48ec12
Refactor deprecated netty ConcurrentSet (#285) 2022-07-27 20:35:46 +08:00
AngersZhuuuu
fe17914942
Refactor pom import issue (#277) 2022-07-25 17:49:55 +08:00
Keyong Zhou
6442f38a33
[ISSUE-267] Extend API to support more partition types: MapPartition,… (#268) 2022-07-17 16:28:37 +08:00
AngersZhuuuu
36cc234dd4
[ISSUE-246][REFACTOR] Refactor LifecycleManager to make it's code more clear and more readable (#252) 2022-07-12 15:37:49 +08:00
AngersZhuuuu
d2a0ad480e
[ISSUE-222][BUG] RequestSlotResponse/RegisterShuffleResponse should handle null issue (#226) 2022-07-08 12:33:40 +08:00
nafiy
6f8fb8747f
Modify argument class and add config (#212) 2022-07-01 23:17:24 +08:00
Keyong Zhou
49f2a00943
[ISSUE-208] Refine log levels (#210) 2022-07-01 14:57:30 +08:00
AngersZhuuuu
909e8b2f53
[ISSUE-190][BUG] After WorkerLost, response to worker heartbeat RPC to, then worker can clean the data. (#192) 2022-06-29 22:25:29 +08:00
AngersZhuuuu
3079d0ac7a
[ISSUE-176][BUG] Handle RegisterWorker use wrong worker info when trigger lost event (#177) 2022-06-28 18:13:33 +08:00
Ethan Feng
f78451b93d
fix an ArithmeticException. (#167) 2022-06-27 17:01:55 +08:00
mingji
d4d8eb3838 update pom version. 2022-06-24 14:28:42 +08:00
nafiy
491f89bbb5
[FEATURE]Add metrics source for JVM and CPU (#125)
* Add metrics source for JVM and CPU

* Fix scala style issue
2022-05-30 13:26:54 +08:00
AngersZhuuuu
730d0c4a97
[ISSUE-120] [BUG] Master‘s metrics of WorkerSlotsCount / WorkerSlotsUsed/ OverloadWorkerCount not update (#121)
[ISSUE-120] [BUG] Master‘s metrics of WorkerSlotsCount / WorkerSlotsUsed/ OverloadWorkerCount not update
2022-05-23 19:19:24 +08:00
Ethan Feng
ac645a464b
update netty and ratis version. (#115) 2022-05-19 11:25:55 +08:00
Ethan Feng
409da82964
[Bug]fix stuck under high memory pressure. (#90) 2022-04-14 18:53:39 +08:00
Ethan Feng
baa2836216
Add metrics: (#85)
1.shuffle fetch send data time.
 2.open stream time.
 3.memory critical count.
2022-04-02 15:05:27 +08:00
Ethan Feng
9ad8254b0a
AQE support. (#67) 2022-04-01 20:19:01 +08:00
AngersZhuuuu
eacb9a1217
Refactor the configuration (#72) 2022-03-11 12:03:43 +08:00
Angerszhuuuu
ba632cadfd Fix Issue 63 2022-03-03 15:33:24 +08:00
wangshengjie123
710e4c2c0b
[BUG] Fix Rpc error: worker cannot send heartbeat to master (#54) 2022-02-15 17:47:29 +08:00
Ethan Feng
356a1952e4
Multi Client Support (#47)
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2022-01-29 22:28:06 +08:00
Tony Doen
302891a1b9
[BUG] ClusterLoadFallbackPolicy is not strictness when a shuffle with big partitions to register (#30) 2022-01-26 15:16:01 +08:00
zky.zhoukeyong
ba5920acde Initial Commit for RSS 2021-12-28 20:57:35 +08:00