Commit Graph

493 Commits

Author SHA1 Message Date
Angerszhuuuu
791d72d45f
[CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR (#1494) 2023-05-17 17:57:27 +08:00
Angerszhuuuu
7c6cb2f3bb
[CELEBORN-588] Remove test conf's category (#1491) 2023-05-17 17:37:28 +08:00
Angerszhuuuu
64a3534f71
[CELEBORN-584] Worker side should expose push/replicate/fetch Netty allocator metrics (#1489) 2023-05-16 17:51:33 +08:00
Shuang
f83304c337
[CELEBORN-581][Flink] Support JobManager failover. (#1485) 2023-05-16 14:51:53 +08:00
Angerszhuuuu
d657f8268a
[CELEBORN-586] Add SystemMiscSource to indicate system running status (#1488) 2023-05-16 14:03:07 +08:00
zhongqiangchen
5769c3fdc7
[CELEBORN-552] Add HeartBeat between the client and worker to keep alive (#1457) 2023-05-10 19:35:51 +08:00
Shuang
fb753fd48e
[CELEBORN-573] Guarantee resource/app/worker change persistent to raft in Ha Mode. (#1477) 2023-05-10 14:28:52 +08:00
Angerszhuuuu
778b5440bc
[CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout (#1461) 2023-05-10 12:29:06 +08:00
Shuang
2fea818fa8
[CELEBORN-579] revert Destroy Message rename for compatibility. (#1482) 2023-05-09 15:24:02 +08:00
Ethan Feng
3e0d779962
[CELEBORN-576] Add static identity provider and manually settable identity provider for non-hadoop environment. (#1480) 2023-05-08 17:29:01 +08:00
Angerszhuuuu
a315a2eb41
[CELEBORN-575] PartitionLocationInfo change cause quick upgrade impacted (#1479) 2023-05-08 16:56:42 +08:00
Angerszhuuuu
ef4c12e0fe
[CELEBORN-565] FETCH_MAX_RETRIES should double when enable replicates (#1471) 2023-04-28 14:27:35 +08:00
Angerszhuuuu
bfce6052d7
[CELEBORN-560][FOLLOWUP] Follow the original design for handling rerun & speculative task after handleStageEnd (#1468) 2023-04-28 11:18:42 +08:00
Angerszhuuuu
7a4f2ebd8a
[CELEBORN-547] Refactor request related API (#1452) 2023-04-27 16:25:41 +08:00
Angerszhuuuu
be84e8ba0d
[CELEBORN-562][REFACTOR] Rename Destroy and DestroyResponse to make it more clear (#1467) 2023-04-27 12:31:32 +08:00
Shuang
64a4f7274c
[CELEBORN-554][Tuning] Improve For LM to avoid reserve/commit empty worker resources (#1459) 2023-04-26 18:04:50 +08:00
Angerszhuuuu
13ce04f8a1
[CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT (#1462)
* [CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
2023-04-26 15:19:34 +08:00
Shuang
0b2e4877bd
[CELEBORN-553] Improve IO (#1458) 2023-04-25 21:14:06 +08:00
Shuang
d68deecaaa
[CELEBORN-546][FLINK] Use autoIncrement partitionId replace encode(mapId, attemptId) for generating partitionId (#1447) 2023-04-22 16:33:22 +08:00
Angerszhuuuu
181c1bfcd6
[CELEBORN-524][PERF] CongestionControl call too much ChannelsLimiter onTrim cause CPU stuck or occupy too much CPU cause no cpu for handlePushData (#1428) 2023-04-21 15:44:56 +08:00
Angerszhuuuu
6830cb61ef
[CELEBORN-540][Refactor] Add config entity of celeborn.rpc.io.threads (#1443)
* [CELEBORN-540][CONF] Add config entity of celeborn.rpc.io.threads
2023-04-21 11:21:41 +08:00
Shuang
62d60de8c5
[CELEBORN-537] Improve blacklist compute & minor fix for Flink (#1441)
[CELEBORN-537] improve blacklist compute & minor fix for flink
2023-04-20 18:30:10 +08:00
Ethan Feng
6378a386d0
[CELEBORN-530][REFACTOR] Move stream manager and memory manager to worker module. (#1439) 2023-04-20 10:17:26 +08:00
Ethan Feng
8be82548e1
[CELEBORN-520][FLINK] Tune map partition reading performance. (#1424) 2023-04-17 16:47:09 +08:00
Shuang
412d10b7dc
[CELEBORN-479][FLINK] support stopTrackingAndReleasePartitions when worker is not available (#1405) 2023-04-17 14:44:24 +08:00
Angerszhuuuu
938aec0e9f
[CELEBORN-528][REFACTOR] limitZeroInFlight should show inflight target (#1433) 2023-04-17 11:53:34 +08:00
Angerszhuuuu
932ccd0841
[CELEBORN-523][REFACTOR] Remove unnecessary code in WorkerPartitionLocationInfo (#1427) 2023-04-15 22:36:48 +08:00
Shuang
a22c6ca749
[CELEBORN-521] correct exception and unify unRetryableException (#1425) 2023-04-15 22:27:28 +08:00
Angerszhuuuu
3a21362265
[CELEBORN-511][IMPROVE] Move onTrim tag to StorageManager to avoid frequent trim action (#1415)
* [CELEBORN-511][IMPROVE] Move onTrim tag to StorageManager to avoid frequent trim action
2023-04-14 10:35:51 +08:00
Angerszhuuuu
480d7ac0d9
[CELEBORN-519][PERF] getMaster/SlaveLocation directly use uniqueId as key (#1421) 2023-04-13 21:53:33 +08:00
Ethan Feng
9cccfc9872
[CELEBORN-431][FLINK] Support dynamic buffer allocation in reading map partition. (#1407) 2023-04-13 10:37:47 +08:00
Angerszhuuuu
32b497973e
[CELEBORN-517][IMPROVEMENT] Optimize stopTimer/startTimer cpu cost (#1419) 2023-04-12 20:12:01 +08:00
Angerszhuuuu
da98ed9bea
[CELEBORN-516][PERF] Remove RPCSource since it cost too much CPU (#1420) 2023-04-12 18:47:06 +08:00
Angerszhuuuu
e5722126e9
[CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication (#1408)
* [CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication
2023-04-12 14:59:32 +08:00
Keyong Zhou
7dd2230a04
[CELEBORN-510][FLINK] DataPartitionReader.addBuffer should not call s… (#1413) 2023-04-07 18:17:55 +08:00
Shuang
9b2b8a01ec
[CELEBORN-507] don't set up worker endpoint when update meta and remove compare worker meta with workers (#1412) 2023-04-07 11:46:24 +08:00
Angerszhuuuu
cad2836e85
[CELEBORN-505] Fix typo of SHUFFLE_CHUCK_SIZE (#1411) 2023-04-04 19:15:30 +08:00
Keyong Zhou
2e1598c011
[CELEBORN-485] Make celeborn.push.replicate.enabled default to false (#1394) 2023-04-03 16:36:29 +08:00
Angerszhuuuu
bf46336d54
[CELEBORN-487][PERF] ShuffleClientSide support blacklist to avoid client side timeout in same worker multiple times (#1399) 2023-04-03 11:50:04 +08:00
Angerszhuuuu
b4f8ab19bd
[CELEBORN-484][PERF] Master trigger LifecycleManager commit shutdown worker's partition location. (#1395)
* [CELEBORN-484][PERF] Master trigger LifecycleManager commit shutdown worker's  partition location.
2023-04-02 09:18:12 +08:00
Keyong Zhou
61416a828d
[CELEBORN-497]Fix and enable JDK 11 for CI (#1401) 2023-03-31 13:39:02 +08:00
Shuang
45013b8bae
[CELEBORN-489][FLINK]fix retry client for open stream (#1397) 2023-03-30 11:44:19 +08:00
zhongqiangchen
cd92c423cd
[CELEBORN-475] Support extra tags for prometheus metrics (#1385)
[CELEBORN-475] Support extra tags for prometheus metrics
2023-03-28 21:22:28 +08:00
Ethan Feng
6cee85748d
[CELEBORN-477][FLINK] Report failed partition to flink framework. (#1391) 2023-03-28 15:54:37 +08:00
Keyong Zhou
cb19ed1c66
[CELEBORN-479][PERF] Refactor DataPushQueue.takePushTask to avoid busy wait (#1386) 2023-03-27 16:18:55 +08:00
Fei Wang
b40c573069
[CELEBORN-474][FOLLOWUP] Using inner static ConcurrentHashMap class and only apply for JDK8 (#1384) 2023-03-27 16:16:23 +08:00
Fei Wang
7c444cb0c5
[CELEBORN-474] Speed up ConcurrentHashMap#computeIfAbsent (#1383) 2023-03-26 09:41:59 +08:00
Fei Wang
c609c0ebaa
[MINOR] Fix typo and remove unused code (#1381)
* fix typo

* remove unused
2023-03-25 23:20:33 +08:00
Angerszhuuuu
acf6fd3bd2
[CELEBORN-345] TransportResponseHandler create too much thread (#1373) 2023-03-24 17:16:26 +08:00
Shuang
89b3f3887d
[CELEBORN-356] [FLINK] Support release single partition resource (#1314) 2023-03-24 17:15:28 +08:00
Keyong Zhou
2bfa7e8965
[CELEBORN-466][FLINK] ReadBufferDispatcher.recycle should log error when refCnt != 1 (#1377) 2023-03-23 20:33:28 +08:00
Lianne Li
a071bdf6d7
[CELEBORN-449] Repair the hdfs path regex (#1367)
* Path protocols are all started with xxx://, and is unnecessary to restrict the content after that. Actually, it makes an error when write shuffle files which like "xxx://abc/shuffle/hadoop/rss-worker/shuffle_data/spark-0de72e2ce2e24f6db69c2228dd12a514/0/0-0-0"

---------

Co-authored-by: ming.li2 <ming.li2@dmall.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Ethan Feng <ethanfeng@apache.org>
2023-03-23 12:40:41 +08:00
Keyong Zhou
885e0cef32
[CELEBORN-459] Remove chunkTracker from FileManagedBuffers to avoid conflict with stream reuse (#1372) 2023-03-22 11:43:38 +08:00
Keyong Zhou
3d6fba553b
[CELEBORN-454] Code refine for worker (#1371) 2023-03-22 10:39:14 +08:00
Angerszhuuuu
f16c7b414e
[CELEBORN-445] Add CelebornRackResolver to support rack reoslve (#1366) 2023-03-21 16:40:46 +08:00
Angerszhuuuu
56d796638f
[CELEBORN-438] Move ServletPath to MetricsSytsem (#1364) 2023-03-20 18:22:40 +08:00
Keyong Zhou
9401db2bc8
[CELEBORN-443] Code refine for client and common (#1362) 2023-03-20 10:37:43 +08:00
乐活优格
0b78c6d325
[CELEBORN-442]Support hdfs compatible file system (#1360) 2023-03-18 11:47:46 +08:00
Ethan Feng
0ebad677d7
[CELEBORN-434] Add constrain about memory manager's parameters. (#1356) 2023-03-17 15:14:03 +08:00
Ethan Feng
6f317c77ee
[CELEBORN-422][FLINK] Remove unused fields in ReadData. (#1347) 2023-03-14 19:49:00 +08:00
Shuang
1fa00c0317
[CELEBORN-391][FLINK][FOLLOW UP] fix clean stream twice & refine log & add ut (#1344) 2023-03-14 15:38:49 +08:00
Shuang
cd5241d399
[CELEBORN-381][FLINK] notify the task with the error message when channel in active. (#1341) 2023-03-14 11:28:03 +08:00
Ethan Feng
971c93d4d9
[CELEBORN-419][FLINK] Fix memory leak when receive RPCs with body. (#1343) 2023-03-14 11:27:36 +08:00
Ethan Feng
2385215578
[CELEBORN-394] Refine memory manager's log. (#1332) 2023-03-13 15:13:33 +08:00
Angerszhuuuu
b56624d3c1
[CELEBORN-405] Add metrics about lost workers (#1330)
* [CELEBORN-405] Add metrics about lost workers
2023-03-13 14:49:49 +08:00
Ethan Feng
c78023824a
[CELEBORN-397][FLINK] Flink plugin support UnpooledByteBufAllocator. (#1324) 2023-03-13 11:36:13 +08:00
Ethan Feng
bb8401e401
[CELEBORN-403][FLINK] Add metrics about buffer dispatcher request queue length. (#1329) 2023-03-13 11:15:00 +08:00
Angerszhuuuu
a336f12cc8
[CELEBORN-400] Add RPC metrics for OpenStream (#1326) 2023-03-10 21:22:05 +08:00
Angerszhuuuu
4b334df7a6
[CELEBORN-399] Make fileSorterExecutors thread num can be customized (#1325) 2023-03-10 21:10:43 +08:00
Shuang
ec745e36d1
[CELEBORN-391][Flink] Refine register/release synchronization (#1321) 2023-03-09 20:00:50 +08:00
Ethan Feng
aebb870d08
[CELEBORN-386][FLINK] Async open DataPartitionReader to release Netty thread earlier. (#1318) 2023-03-09 12:31:01 +08:00
Ethan Feng
675a7da393
[CELEBORN-368][FLINK] Pass exceptions in buffer stream. (#1304) 2023-03-03 15:43:30 +08:00
Keyong Zhou
9aabb43699
[CELEBORN-372] Remove the standard Apache License header from the top of third-party source files (#1301) 2023-03-02 19:07:01 +08:00
Keyong Zhou
dcedf7b0a9
[CELEBORN-348] Support fetchTime in load-aware slots assignment strategy (#1287) 2023-03-02 18:31:50 +08:00
Ethan Feng
d4af8fd094
[CELEBORN-353][FLINK] Fix incorrect read buffer metric. (#1288) 2023-03-01 11:08:13 +08:00
zhongqiangchen
cb76c4de4c
[CELEBORN-350][FLINK] Add PluginConf to be compatible with old configurations 2023-02-28 20:36:11 +08:00
jiaoqingbo
7dc1ab13db
[CELEBORN-351] Add \n to the log to make log print clearer (#1285) 2023-02-28 17:55:17 +08:00
Shuang
5654c62f35
[CELEBORN-347][Flink] fix memory leak and refactor BufferStreamManager (#1282) 2023-02-28 15:18:59 +08:00
Angerszhuuuu
eda21ead24
[CELEBORN-344] Change PUSH_DATA_FAIL_MASTER/SALVE to PUSH_DATA_WRITE_FAIL_MASTER/SALVE (#1281) 2023-02-28 11:29:40 +08:00
Keyong Zhou
7adf1fca41
[CELEBORN-295] Optimize data push (#1232)
* [CELEBORN-295] Add double buffer for sort pusher
2023-02-28 10:35:55 +08:00
Angerszhuuuu
24f5478adc
[CELEBORN-338] Clean duplicated exception message of handling push data (#1274) 2023-02-28 10:35:18 +08:00
Shuang
935806f036
[CELEBORN-341][Flink] cache file group for map partition in Flink plugin (#1277) 2023-02-26 20:31:20 +08:00
Keyong Zhou
3c8c58e09d
[CELEBORN-301] Refactor PartitionLocationInfo to use ConcurrentHashMap (#1278) 2023-02-26 16:46:30 +08:00
Ethan Feng
f0b9236ff2
[CELEBORN-340][FLINK] Reuse file channels in map partition read. (#1276) 2023-02-24 19:26:51 +08:00
Angerszhuuuu
a7587c3fe7
[CELEBORN-337] Remove unnecessary StatusCode.message (#1272)
* [CELEBORN-337] Remove unnecessary StatusCode.message
2023-02-24 15:11:07 +08:00
zhongqiangchen
af9e8366c9
[CELEBORN-329] Add rpc address to exception message when failed to sendrpc (#1263) 2023-02-23 19:32:21 +08:00
Shuang
9754616d79
[CELEBORN-330] fix deadlock when use the same netty channel to receive data while other thread wait the response (#1265) 2023-02-23 17:57:43 +08:00
Angerszhuuuu
322f0d2b41
[CELEBORN-316] Wrap Celeborn exception with CelebornIOException (#1253) 2023-02-22 16:10:11 +08:00
Ethan Feng
1704aff95c
[CELEBORN-327][Flink] BufferStreamMananger should recycle buffer in reader thread. (#1261) 2023-02-22 16:02:58 +08:00
Ethan Feng
cb8df62ec5
[CELEBORN-324][FLINK] Flink plugin needs reuse connections. (#1257) 2023-02-21 18:32:00 +08:00
Shuang
1b1517c7b4
[CELEBORN-323] readBuffers need synchronized as recycle buffer will call readers in multiple threads (#1256) 2023-02-21 15:58:19 +08:00
Ethan Feng
5dd5e97225
[CELEBORN-322][Flink] Copy out message if it‘s readData only. (#1255) 2023-02-21 15:51:13 +08:00
Ethan Feng
c649655933
Revert "[CELEBORN-322][Flink] Copy out message if it‘s readData only."
This reverts commit 0aa37ed7d3.
2023-02-21 14:48:08 +08:00
Ethan Feng
0aa37ed7d3
[CELEBORN-322][Flink] Copy out message if it‘s readData only. 2023-02-21 14:45:39 +08:00
Ethan Feng
d7798127c9
[CELEBORN-319] FlinkTransportClient should not reuse connection. (#1252) 2023-02-21 11:16:30 +08:00
Shuang
cf833e568c
[CELEBORN-318] fix deadlock & bugs in bufferStreamManager (#1251) 2023-02-21 11:12:16 +08:00
Shuang
a6103e4bf8
[CELEBORN-317] add REGISTER_MAP_PARTITION_TASK message type (#1250) 2023-02-20 22:01:35 +08:00
Ethan Feng
7e9ba19d58
[CELEBORN-302] Fix workers count out of sync in HA mode. (#1239) 2023-02-20 21:46:33 +08:00
zhongqiangchen
b5dc106af8
[CELEBORN-291] optimize shuffleclientimpl creating client and pushdata for mappartition (#1224) 2023-02-17 19:07:19 +08:00
Ethan Feng
0c8bb83114
[CELEBORN-234] Implement buffer stream. (#1221) 2023-02-17 17:38:36 +08:00
Ethan Feng
3aacede5f8
[CELEBORN-283] Derive network layer for flink plugin. (#1222) 2023-02-17 14:12:54 +08:00
zhongqiangchen
5236df68af
[CELEBORN-292] optimize mappartitionfilewriter flushing index and reading data header (#1225) 2023-02-17 13:42:28 +08:00
Ethan Feng
1dcfdb0c8f
[CELEBORN-281] Add metrics about buffer stream read buffer. (#1216) 2023-02-17 11:20:07 +08:00
Keyong Zhou
89b4eab3b6
[CELEBORN-309] Fix some potential concurrent issues in InFlightRequestTracker (#1243) 2023-02-17 10:01:19 +08:00
Angerszhuuuu
57f775a7e9
[CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence (#1208) 2023-02-16 18:27:37 +08:00
Ethan Feng
a364fb27b2
[CELEBORN-282] Add BacklogAnnouncement RPC. (#1217) 2023-02-16 14:58:39 +08:00
Ethan Feng
534853bf8a
[CELEBORN-278] Add openStreamWithCredit RPC. (#1214) 2023-02-16 14:07:13 +08:00
jiaoqingbo
bd9e0ddc1f
[CELEBORN-304] Missing setIfMissing celeborn.$module.io.serverThreads (#1238) 2023-02-15 15:49:08 +08:00
Rex(Hui) An
2068e6ae37
[CELEBORN-279] Add user level push data speed metric (#1213) 2023-02-13 12:04:44 +08:00
jiaoqingbo
3a92b0d911
[CELEBORN-284] fix typo in CelebornConf (#1218)
Co-authored-by: jiaoqb <jiaoqb@asiainfo.com>
2023-02-10 14:59:36 +08:00
Angerszhuuuu
dae58a664c
[CELEBORN-239][FOLLOWUP] PUSH_DATA_TIMEOUT_MASTER/SLAVE should support convert through RPC (#1211)
* [CELEBORN-239][FOLLOWUP] PUSH_DATA_TIMEOUT_MASTER/SLAVE should support convert through  RP
2023-02-08 17:16:29 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed (#1167) 2023-02-07 14:24:30 +08:00
Rex(Hui) An
bb113ec9be
[CELEBORN-207] Support network congestion control (#1066) 2023-02-07 12:06:18 +08:00
Shuang
2634476758
[CELEBORN-267] reuse stream when client channel reconnected (#1200) 2023-02-03 15:12:45 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too (#1185) 2023-02-03 11:53:15 +08:00
Angerszhuuuu
04427f2b16
[CELEBORN-247] Add metrics for each user's quota usage (#1182) 2023-02-02 18:31:08 +08:00
Ethan Feng
a43e3141bc
[CELEBORN-224][FOLLOWUP] Correct license and notices. (#1189) 2023-02-02 10:52:11 +08:00
Angerszhuuuu
98a5a3e16e
[CELEBORN-257][IMPROVEMENT] Avoid one hash searching when process message in TransportResponseHandler (#1193) 2023-02-01 14:59:53 +08:00
Angerszhuuuu
9ce48a648f
[CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs (#1190) 2023-02-01 11:12:16 +08:00
Shuang
7162be2fae
[CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker (#1149) 2023-01-31 18:53:36 +08:00
Rex(Hui) An
6e82e7dd6c
[CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA (#1188) 2023-01-31 15:39:58 +08:00
Angerszhuuuu
1311fb53d1
[CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should have their own ERROR type (#1181)
* [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type
2023-01-30 17:47:22 +08:00
Angerszhuuuu
122da47815
[CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout (#1179) 2023-01-30 11:57:07 +08:00
Kaijie Chen
3da338a716
[CELEBORN-248] Non-ASCII characters in source code (#1183) 2023-01-29 21:07:41 +08:00
nafiy
d6d537df93
[CELEBORN-229][FOLLOWUP] Support collect metrics with customized labels (#1174) 2023-01-28 16:02:58 +08:00
Keyong Zhou
e47f1e33b0
[CELEBORN-55][FOLLOWUP] Code refine (#1175) 2023-01-20 16:22:47 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker (#1102) 2023-01-20 10:18:45 +08:00
nafiy
e09b629da2
[CELEBORN-229][FEATURE] Support collect metrics with customized labels (#1173) 2023-01-19 11:59:48 +08:00
Kaijie Chen
2b6822e3c7
[CELEBORN-230] AppDiskUsageSnapShot overrides equals() without override hashCode() (#1172) 2023-01-18 17:21:32 +08:00
Ethan Feng
a239f9f284
[CELEBORN-228]Refactor PartitionFileSorter to avoid specific JDK dependency. (#1168) 2023-01-16 20:06:47 +08:00
zy.jordan
bb96700415
[CELEBORN-223] The default rpc thread num of pushServer/replicateServer/fetchServer should be the number of total of Flusher's thread (#1163) 2023-01-16 12:03:46 +08:00
Keyong Zhou
fa7ba43136
[CELEBORN-225] Add global default configuration for number of flusher… (#1165) 2023-01-14 13:20:44 +08:00
zhongqiangczq
411ab09ffb
[CELEBORN-158][Flink] Add ShuffleServiceFactory to Support MapPartition in … (#1105) 2023-01-13 16:38:46 +08:00
Shuang
810a8d01e0
[CELEBORN-212] refresh client if current client is inactive. (#1159) 2023-01-11 11:54:50 +08:00
Shuang
1332362bff
[CELEBORN-213] Add configuration for whether to close idle connections in client side (#1157) 2023-01-10 19:13:33 +08:00
zy.jordan
19197b9190
[CELEBORN-214] Push/Replicate/Fetch io threads default value is 16 (#1158) 2023-01-10 17:46:56 +08:00
Angerszhuuuu
e155ec122a
[CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice (#1136) 2023-01-10 11:39:40 +08:00
Angerszhuuuu
0d5809ff0c
[CELEBORN-192][IMPROVEMENT] Change FAILED status to REQUEST_FAILED since it's all used when RPC request failed. (#1139) 2023-01-06 16:53:04 +08:00
Ethan Feng
5595f2f4b3
[CELEBORN-124]Add buffer stream. (#1069) 2023-01-06 15:54:52 +08:00
Angerszhuuuu
415452d9c4
[CELEBORN-189][IMPROVEMENT] PushDataFailedSlave should add slave worker to blacklist (#1135) 2023-01-05 20:12:07 +08:00
Fu Chen
ab449ffdd7
[CELEBORN-198] Fix the wrong configuration path of plugin protobuf-maven-plugin and … (#1146) 2023-01-05 20:09:31 +08:00
Cheng Pan
b8758a7cb6
[CELEBORN-181][TEST] Rename RssFunSuite to CelebornFunSuite (#1125) 2022-12-29 18:10:14 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start (#1119) 2022-12-29 12:07:20 +08:00
Angerszhuuuu
b13ddac9d2
[CELEBORN-172][Refactor] Load/Make snapshot use Protobuf serde (#1118) 2022-12-29 11:51:14 +08:00
Angerszhuuuu
829f35c753
[CELEBORN-176][BUG] Fix wrong alternative conf of celeborn.worker.flusher.ssd.threads (#1121) 2022-12-29 11:11:20 +08:00
Angerszhuuuu
5603e62e95
[CELEBORN-174][REFACTOR] Move AppDiskUsage related to meta package (#1117) 2022-12-27 15:24:42 +08:00
Ethan Feng
3cdc25286d
[CELEBORN-165] Fix ut RetryCommitFilesTest failure. (#1111) 2022-12-22 11:39:40 +08:00
Ethan Feng
5aa959a335
[CELEBORN-157] Change prefix of configurations to celeborn. (#1104) 2022-12-21 15:17:28 +08:00
nafiy
f13dfb7421
[CELEBORN-113][FEATURE] Add metrics to monitor non-critical error number on local device (#1100) 2022-12-20 22:30:55 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata (#1097) 2022-12-20 20:40:42 +08:00