Commit Graph

409 Commits

Author SHA1 Message Date
Angerszhuuuu
e18a5ea769
[CELEBORN-624] StorageManager should only remove expired app dirs (#1531) 2023-06-02 11:33:33 +08:00
Ethan Feng
d33916e571
[CELEBORN-625] Add a config to enable/disable UnsafeRow fast write. (#1532) 2023-06-01 20:55:45 +08:00
Angerszhuuuu
cf308aa057
[CLEBORN-595] Refine code frame of CelebornConf (#1525) 2023-06-01 10:37:58 +08:00
Angerszhuuuu
6d5dd50915
[CELEBORN-595][FOLLOWUP] Fix change version to 0.3.0. (#1522) 2023-05-30 20:12:56 +08:00
Angerszhuuuu
62681ba85d
[CELEBORN-595] Rename and refactor the configuration doc. (#1501) 2023-05-30 15:14:12 +08:00
zhongqiangchen
f117cff776
[CELEBORN-618] [FLINK] worker side adds partition split configuration options (#1520) 2023-05-30 14:13:31 +08:00
Angerszhuuuu
07011f5a4d
[CELEBORN-601] Consolidate configsWithAlternatives with ConfigBuilder.withAlternative (#1506) 2023-05-28 09:13:05 +08:00
Leo Li
de97ad26ce
[CELEBORN-599] Consolidate calculation of mount point (#1505)
* [CELEBORN-599] Fix worker dirs get mount point

* update

* update

---------

Co-authored-by: liyihe <liyihe@bigo.sg>
2023-05-23 14:06:02 +08:00
Angerszhuuuu
6619015a63
[CELEBORN-596] Worker don't need to update disk max slots (#1502) 2023-05-23 10:30:35 +08:00
Angerszhuuuu
d244f44518
[CELEBORN-593] Refine some RPC related default configurations (#1498) 2023-05-19 18:23:12 +08:00
Angerszhuuuu
615d9a111f
[CELEBORN-487] Remove wrong space of config SHUFFLE_CLIENT_PUSH_BLACK (#1500) 2023-05-19 14:27:57 +08:00
Ethan Feng
ac78afdc4e
[CELEBORN-594] Eliminate Ratis noisy logs. (#1499) 2023-05-19 14:05:52 +08:00
Angerszhuuuu
42219aeb2a
[CELEBORN-592][REFACTOR] Refactor PbSerDeUtils's some foreach code format (#1497) 2023-05-18 16:22:14 +08:00
Shuang
6eabc519b3
[CELEBORN-591] RatisSystem need decrease no leader timeout configuration. (#1495) 2023-05-18 14:49:06 +08:00
Angerszhuuuu
811e192bbd
[CELEBORN-446] Support rack aware during assign slots for ROUNDROBIN (#1370) 2023-05-18 13:58:51 +08:00
Ethan Feng
7015d2463a
[CELEBORN-583] Merge pooled memory allocators. (#1490) 2023-05-18 10:37:30 +08:00
Angerszhuuuu
791d72d45f
[CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR (#1494) 2023-05-17 17:57:27 +08:00
Angerszhuuuu
7c6cb2f3bb
[CELEBORN-588] Remove test conf's category (#1491) 2023-05-17 17:37:28 +08:00
Angerszhuuuu
64a3534f71
[CELEBORN-584] Worker side should expose push/replicate/fetch Netty allocator metrics (#1489) 2023-05-16 17:51:33 +08:00
Shuang
f83304c337
[CELEBORN-581][Flink] Support JobManager failover. (#1485) 2023-05-16 14:51:53 +08:00
Angerszhuuuu
d657f8268a
[CELEBORN-586] Add SystemMiscSource to indicate system running status (#1488) 2023-05-16 14:03:07 +08:00
zhongqiangchen
5769c3fdc7
[CELEBORN-552] Add HeartBeat between the client and worker to keep alive (#1457) 2023-05-10 19:35:51 +08:00
Shuang
fb753fd48e
[CELEBORN-573] Guarantee resource/app/worker change persistent to raft in Ha Mode. (#1477) 2023-05-10 14:28:52 +08:00
Angerszhuuuu
778b5440bc
[CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout (#1461) 2023-05-10 12:29:06 +08:00
Shuang
2fea818fa8
[CELEBORN-579] revert Destroy Message rename for compatibility. (#1482) 2023-05-09 15:24:02 +08:00
Ethan Feng
3e0d779962
[CELEBORN-576] Add static identity provider and manually settable identity provider for non-hadoop environment. (#1480) 2023-05-08 17:29:01 +08:00
Angerszhuuuu
a315a2eb41
[CELEBORN-575] PartitionLocationInfo change cause quick upgrade impacted (#1479) 2023-05-08 16:56:42 +08:00
Angerszhuuuu
ef4c12e0fe
[CELEBORN-565] FETCH_MAX_RETRIES should double when enable replicates (#1471) 2023-04-28 14:27:35 +08:00
Angerszhuuuu
bfce6052d7
[CELEBORN-560][FOLLOWUP] Follow the original design for handling rerun & speculative task after handleStageEnd (#1468) 2023-04-28 11:18:42 +08:00
Angerszhuuuu
7a4f2ebd8a
[CELEBORN-547] Refactor request related API (#1452) 2023-04-27 16:25:41 +08:00
Angerszhuuuu
be84e8ba0d
[CELEBORN-562][REFACTOR] Rename Destroy and DestroyResponse to make it more clear (#1467) 2023-04-27 12:31:32 +08:00
Shuang
64a4f7274c
[CELEBORN-554][Tuning] Improve For LM to avoid reserve/commit empty worker resources (#1459) 2023-04-26 18:04:50 +08:00
Angerszhuuuu
13ce04f8a1
[CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT (#1462)
* [CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
2023-04-26 15:19:34 +08:00
Shuang
0b2e4877bd
[CELEBORN-553] Improve IO (#1458) 2023-04-25 21:14:06 +08:00
Shuang
d68deecaaa
[CELEBORN-546][FLINK] Use autoIncrement partitionId replace encode(mapId, attemptId) for generating partitionId (#1447) 2023-04-22 16:33:22 +08:00
Angerszhuuuu
181c1bfcd6
[CELEBORN-524][PERF] CongestionControl call too much ChannelsLimiter onTrim cause CPU stuck or occupy too much CPU cause no cpu for handlePushData (#1428) 2023-04-21 15:44:56 +08:00
Angerszhuuuu
6830cb61ef
[CELEBORN-540][Refactor] Add config entity of celeborn.rpc.io.threads (#1443)
* [CELEBORN-540][CONF] Add config entity of celeborn.rpc.io.threads
2023-04-21 11:21:41 +08:00
Shuang
62d60de8c5
[CELEBORN-537] Improve blacklist compute & minor fix for Flink (#1441)
[CELEBORN-537] improve blacklist compute & minor fix for flink
2023-04-20 18:30:10 +08:00
Ethan Feng
6378a386d0
[CELEBORN-530][REFACTOR] Move stream manager and memory manager to worker module. (#1439) 2023-04-20 10:17:26 +08:00
Ethan Feng
8be82548e1
[CELEBORN-520][FLINK] Tune map partition reading performance. (#1424) 2023-04-17 16:47:09 +08:00
Shuang
412d10b7dc
[CELEBORN-479][FLINK] support stopTrackingAndReleasePartitions when worker is not available (#1405) 2023-04-17 14:44:24 +08:00
Angerszhuuuu
938aec0e9f
[CELEBORN-528][REFACTOR] limitZeroInFlight should show inflight target (#1433) 2023-04-17 11:53:34 +08:00
Angerszhuuuu
932ccd0841
[CELEBORN-523][REFACTOR] Remove unnecessary code in WorkerPartitionLocationInfo (#1427) 2023-04-15 22:36:48 +08:00
Shuang
a22c6ca749
[CELEBORN-521] correct exception and unify unRetryableException (#1425) 2023-04-15 22:27:28 +08:00
Angerszhuuuu
3a21362265
[CELEBORN-511][IMPROVE] Move onTrim tag to StorageManager to avoid frequent trim action (#1415)
* [CELEBORN-511][IMPROVE] Move onTrim tag to StorageManager to avoid frequent trim action
2023-04-14 10:35:51 +08:00
Angerszhuuuu
480d7ac0d9
[CELEBORN-519][PERF] getMaster/SlaveLocation directly use uniqueId as key (#1421) 2023-04-13 21:53:33 +08:00
Ethan Feng
9cccfc9872
[CELEBORN-431][FLINK] Support dynamic buffer allocation in reading map partition. (#1407) 2023-04-13 10:37:47 +08:00
Angerszhuuuu
32b497973e
[CELEBORN-517][IMPROVEMENT] Optimize stopTimer/startTimer cpu cost (#1419) 2023-04-12 20:12:01 +08:00
Angerszhuuuu
da98ed9bea
[CELEBORN-516][PERF] Remove RPCSource since it cost too much CPU (#1420) 2023-04-12 18:47:06 +08:00
Angerszhuuuu
e5722126e9
[CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication (#1408)
* [CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication
2023-04-12 14:59:32 +08:00