Angerszhuuuu
13ce04f8a1
[CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT ( #1462 )
...
* [CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
2023-04-26 15:19:34 +08:00
Shuang
0b2e4877bd
[CELEBORN-553] Improve IO ( #1458 )
2023-04-25 21:14:06 +08:00
Ethan Feng
01d8d1079c
[CELEBORN-550][FLINK] Fix bufferQueue release and poll concurrent problem. ( #1455 )
2023-04-25 15:06:06 +08:00
Ethan Feng
537fc94df2
[CELEBORN-549] Update readme about deploy flink client. ( #1454 )
2023-04-24 21:03:53 +08:00
Ethan Feng
8584f1049f
Add DingTalk Group info. ( #1453 )
2023-04-24 10:11:24 +08:00
Shuang
343f1e62d2
[CELEBORN-537][FOLLOWUP] Fix blacklist potentially lost failure workers ( #1449 )
2023-04-23 10:16:21 +08:00
Angerszhuuuu
17ae0cd9b1
[CELEBORN-541][FOLLOWUP] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled ( #1448 )
...
* [CELEBORN-541][PERF] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled
2023-04-23 10:15:41 +08:00
Shuang
d68deecaaa
[CELEBORN-546][FLINK] Use autoIncrement partitionId replace encode(mapId, attemptId) for generating partitionId ( #1447 )
2023-04-22 16:33:22 +08:00
Angerszhuuuu
e3ae2f0e17
[CELEBORN-541][FOLLOWUP] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled ( #1445 )
...
* [CELEBORN-541][FOLLOWUP] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled
2023-04-21 17:26:52 +08:00
Angerszhuuuu
0c2d3e647d
[CELEBORN-532][METRICS] Refine push-related failure metrics ( #1442 )
...
* [CELEBORN-532][METRICS] Refine push-related failure metrics
2023-04-21 17:05:43 +08:00
Angerszhuuuu
16d193071f
[CELEBORN-541][PERF] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled ( #1444 )
...
* [CELEBORN-541][PERF] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled
2023-04-21 17:04:52 +08:00
Angerszhuuuu
181c1bfcd6
[CELEBORN-524][PERF] CongestionControl call too much ChannelsLimiter onTrim cause CPU stuck or occupy too much CPU cause no cpu for handlePushData ( #1428 )
2023-04-21 15:44:56 +08:00
Angerszhuuuu
6830cb61ef
[CELEBORN-540][Refactor] Add config entity of celeborn.rpc.io.threads ( #1443 )
...
* [CELEBORN-540][CONF] Add config entity of celeborn.rpc.io.threads
2023-04-21 11:21:41 +08:00
Shuang
62d60de8c5
[CELEBORN-537] Improve blacklist compute & minor fix for Flink ( #1441 )
...
[CELEBORN-537] improve blacklist compute & minor fix for flink
2023-04-20 18:30:10 +08:00
Ethan Feng
6378a386d0
[CELEBORN-530][REFACTOR] Move stream manager and memory manager to worker module. ( #1439 )
2023-04-20 10:17:26 +08:00
zhongqiangchen
d531ec499e
[CELEBORN-533] Bootstrap scripts should use exec to avoid fork subprocess ( #1437 )
...
* [CELEBORN-533] fix bug that in k8s, SIGTERM can't be catched by worker when worker is shutdown
* add exec command before start_master.sh
* fix master
2023-04-19 22:00:12 +08:00
Fu Chen
2d04850dd1
[CELEBORN-534] Respect the user's configured master host settings ( #1436 )
2023-04-19 14:39:54 +08:00
Ethan Feng
7937d96226
[CELEBORN-535][FLINK] Reduce message decoder overhead. ( #1438 )
2023-04-19 11:00:29 +08:00
Ethan Feng
8be82548e1
[CELEBORN-520][FLINK] Tune map partition reading performance. ( #1424 )
2023-04-17 16:47:09 +08:00
Angerszhuuuu
d53cf40728
[CELEBRON-528][REFACTOR] RegisterShuffle 's log should show clear belongs to which shuffle ( #1434 )
2023-04-17 16:19:29 +08:00
Shuang
412d10b7dc
[CELEBORN-479][FLINK] support stopTrackingAndReleasePartitions when worker is not available ( #1405 )
2023-04-17 14:44:24 +08:00
Angerszhuuuu
938aec0e9f
[CELEBORN-528][REFACTOR] limitZeroInFlight should show inflight target ( #1433 )
2023-04-17 11:53:34 +08:00
Angerszhuuuu
e319b99a1c
[CELEBORN-527][DOC] Fix incorrect monitor the arrangement of documents ( #1432 )
2023-04-17 11:12:19 +08:00
Angerszhuuuu
ecafbf41fc
[CELEBORN-516][FOLLOWUP] Remove removed RPC metrics in metric doc ( #1431 )
2023-04-17 10:46:04 +08:00
Angerszhuuuu
932ccd0841
[CELEBORN-523][REFACTOR] Remove unnecessary code in WorkerPartitionLocationInfo ( #1427 )
2023-04-15 22:36:48 +08:00
Shuang
a22c6ca749
[CELEBORN-521] correct exception and unify unRetryableException ( #1425 )
2023-04-15 22:27:28 +08:00
cxzl25
13f772e0c0
[CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size
2023-04-14 20:45:25 +08:00
Cheng Pan
fb7b311c89
[CELEBORN-499] Move version specific resource to main repo ( #1429 )
...
* [CELEBORN-499] Move version specific resource to main repo
* license
2023-04-14 16:20:51 +08:00
Rex(Hui) An
0b402b5903
[CELEBORN-522] Add worker consume speed metric
2023-04-14 13:38:49 +08:00
Angerszhuuuu
3a21362265
[CELEBORN-511][IMPROVE] Move onTrim tag to StorageManager to avoid frequent trim action ( #1415 )
...
* [CELEBORN-511][IMPROVE] Move onTrim tag to StorageManager to avoid frequent trim action
2023-04-14 10:35:51 +08:00
Angerszhuuuu
480d7ac0d9
[CELEBORN-519][PERF] getMaster/SlaveLocation directly use uniqueId as key ( #1421 )
2023-04-13 21:53:33 +08:00
Angerszhuuuu
a51f3b28b2
[CELEBORN-502][FOLLOWUP] HeatbeatFromApplicationResponse should remove workersSnapshot from localBlacklist ( #1422 )
2023-04-13 11:58:29 +08:00
Ethan Feng
9cccfc9872
[CELEBORN-431][FLINK] Support dynamic buffer allocation in reading map partition. ( #1407 )
2023-04-13 10:37:47 +08:00
Angerszhuuuu
32b497973e
[CELEBORN-517][IMPROVEMENT] Optimize stopTimer/startTimer cpu cost ( #1419 )
2023-04-12 20:12:01 +08:00
zhongqiangchen
166562dd3c
[CELEBORN-518] fix bug that worer uses celeborn.master.metrics.prometheus.port in worker-statefulse ( #1418 )
2023-04-12 19:38:04 +08:00
Angerszhuuuu
da98ed9bea
[CELEBORN-516][PERF] Remove RPCSource since it cost too much CPU ( #1420 )
2023-04-12 18:47:06 +08:00
Shuang
c7e08ed22b
[CELEBORN-514][FLINK] RssBufferStream need guarantee close the stream. ( #1417 )
2023-04-12 18:35:33 +08:00
Angerszhuuuu
e5722126e9
[CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication ( #1408 )
...
* [CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication
2023-04-12 14:59:32 +08:00
Angerszhuuuu
f574a4dafa
[CELEBORN-512][IMPROVEMENT] Sort timestamp and show in date format ( #1416 )
2023-04-11 19:56:48 +08:00
Keyong Zhou
7dd2230a04
[CELEBORN-510][FLINK] DataPartitionReader.addBuffer should not call s… ( #1413 )
2023-04-07 18:17:55 +08:00
Shuang
9b2b8a01ec
[CELEBORN-507] don't set up worker endpoint when update meta and remove compare worker meta with workers ( #1412 )
2023-04-07 11:46:24 +08:00
Binjie Yang
bbd5d0ef5c
[CELEBORN-504] Remove useless fields and modify misnamed configurations in Values.yaml
2023-04-04 19:16:37 +08:00
Angerszhuuuu
cad2836e85
[CELEBORN-505] Fix typo of SHUFFLE_CHUCK_SIZE ( #1411 )
2023-04-04 19:15:30 +08:00
Shuang
a892640353
[CELEBORN-503][FLINK] fix attempt task may use wrong partitionId. ( #1409 )
2023-04-04 15:46:35 +08:00
xunxunmimi5577
c3e8189e62
[CELEBORN-495] Leader should step down when its metadata directory has IO exception ( #1402 )
2023-04-03 17:48:41 +08:00
Angerszhuuuu
015788dd28
[CELEBORN-484][FOLLOWUP] Return shutting worker is empty also need to retain LifecycleManager's shutting workers ( #1403 )
2023-04-03 16:37:46 +08:00
Keyong Zhou
2e1598c011
[CELEBORN-485] Make celeborn.push.replicate.enabled default to false ( #1394 )
2023-04-03 16:36:29 +08:00
Angerszhuuuu
bf46336d54
[CELEBORN-487][PERF] ShuffleClientSide support blacklist to avoid client side timeout in same worker multiple times ( #1399 )
2023-04-03 11:50:04 +08:00
Angerszhuuuu
b4f8ab19bd
[CELEBORN-484][PERF] Master trigger LifecycleManager commit shutdown worker's partition location. ( #1395 )
...
* [CELEBORN-484][PERF] Master trigger LifecycleManager commit shutdown worker's partition location.
2023-04-02 09:18:12 +08:00
Keyong Zhou
61416a828d
[CELEBORN-497]Fix and enable JDK 11 for CI ( #1401 )
2023-03-31 13:39:02 +08:00