liyihe
188b069710
[CELEBORN-623][DOCS] Document how to change RPC type in celeborn-ratis
...
### What changes were proposed in this pull request?
Ratis-shell use GRPC by default. Celeborn support Netty for ratis, if `raft.rpc.type` is not specified, commands may fail.
e.g.
```
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 14.947369960s. [closed=[], open=[[buffered_nanos=14962358255, waiting_for_connection]]]
```
So I think we should update the document to mention how to change the RPC type to in `celeborn-ratis`.
### Why are the changes needed?
Improve user experience
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually test
Closes #1530 from onebox-li/ratis-shell-default-rpc.
Lead-authored-by: liyihe <liyihe@bigo.sg>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-02 20:23:09 +08:00
Angerszhuuuu
e18a5ea769
[CELEBORN-624] StorageManager should only remove expired app dirs ( #1531 )
2023-06-02 11:33:33 +08:00
Ethan Feng
d33916e571
[CELEBORN-625] Add a config to enable/disable UnsafeRow fast write. ( #1532 )
2023-06-01 20:55:45 +08:00
Angerszhuuuu
cf308aa057
[CLEBORN-595] Refine code frame of CelebornConf ( #1525 )
2023-06-01 10:37:58 +08:00
Angerszhuuuu
6d5dd50915
[CELEBORN-595][FOLLOWUP] Fix change version to 0.3.0. ( #1522 )
2023-05-30 20:12:56 +08:00
Angerszhuuuu
62681ba85d
[CELEBORN-595] Rename and refactor the configuration doc. ( #1501 )
2023-05-30 15:14:12 +08:00
zhongqiangchen
f117cff776
[CELEBORN-618] [FLINK] worker side adds partition split configuration options ( #1520 )
2023-05-30 14:13:31 +08:00
Binjie Yang
d30f45ad63
[CELEBORN-450][HELM] Configurable volumes in the values.yaml ( #1508 )
...
* [CELEBORN-450] Configure the mount & volume in the Values.yaml
* fix comments
* fix wrong name
* fix comments
* fix typo
* fix into array
* Wiht User Note Comments
* fix comments
* Update charts/celeborn/templates/worker-statefulset.yaml
---------
Co-authored-by: Cheng Pan <pan3793@gmail.com>
2023-05-29 13:48:23 +08:00
Angerszhuuuu
d244f44518
[CELEBORN-593] Refine some RPC related default configurations ( #1498 )
2023-05-19 18:23:12 +08:00
Angerszhuuuu
615d9a111f
[CELEBORN-487] Remove wrong space of config SHUFFLE_CLIENT_PUSH_BLACK ( #1500 )
2023-05-19 14:27:57 +08:00
Angerszhuuuu
811e192bbd
[CELEBORN-446] Support rack aware during assign slots for ROUNDROBIN ( #1370 )
2023-05-18 13:58:51 +08:00
Ethan Feng
7015d2463a
[CELEBORN-583] Merge pooled memory allocators. ( #1490 )
2023-05-18 10:37:30 +08:00
Angerszhuuuu
791d72d45f
[CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR ( #1494 )
2023-05-17 17:57:27 +08:00
Angerszhuuuu
7c6cb2f3bb
[CELEBORN-588] Remove test conf's category ( #1491 )
2023-05-17 17:37:28 +08:00
Angerszhuuuu
64a3534f71
[CELEBORN-584] Worker side should expose push/replicate/fetch Netty allocator metrics ( #1489 )
2023-05-16 17:51:33 +08:00
Angerszhuuuu
d657f8268a
[CELEBORN-586] Add SystemMiscSource to indicate system running status ( #1488 )
2023-05-16 14:03:07 +08:00
zhongqiangchen
5769c3fdc7
[CELEBORN-552] Add HeartBeat between the client and worker to keep alive ( #1457 )
2023-05-10 19:35:51 +08:00
Angerszhuuuu
778b5440bc
[CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout ( #1461 )
2023-05-10 12:29:06 +08:00
Ethan Feng
3e0d779962
[CELEBORN-576] Add static identity provider and manually settable identity provider for non-hadoop environment. ( #1480 )
2023-05-08 17:29:01 +08:00
Ethan Feng
91b757555e
[CELEBORN-570] Update docs about monitor and deployment. ( #1478 )
2023-05-08 17:07:42 +08:00
Angerszhuuuu
ef4c12e0fe
[CELEBORN-565] FETCH_MAX_RETRIES should double when enable replicates ( #1471 )
2023-04-28 14:27:35 +08:00
Angerszhuuuu
13ce04f8a1
[CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT ( #1462 )
...
* [CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
2023-04-26 15:19:34 +08:00
Shuang
0b2e4877bd
[CELEBORN-553] Improve IO ( #1458 )
2023-04-25 21:14:06 +08:00
Angerszhuuuu
0c2d3e647d
[CELEBORN-532][METRICS] Refine push-related failure metrics ( #1442 )
...
* [CELEBORN-532][METRICS] Refine push-related failure metrics
2023-04-21 17:05:43 +08:00
Angerszhuuuu
181c1bfcd6
[CELEBORN-524][PERF] CongestionControl call too much ChannelsLimiter onTrim cause CPU stuck or occupy too much CPU cause no cpu for handlePushData ( #1428 )
2023-04-21 15:44:56 +08:00
Angerszhuuuu
6830cb61ef
[CELEBORN-540][Refactor] Add config entity of celeborn.rpc.io.threads ( #1443 )
...
* [CELEBORN-540][CONF] Add config entity of celeborn.rpc.io.threads
2023-04-21 11:21:41 +08:00
Angerszhuuuu
e319b99a1c
[CELEBORN-527][DOC] Fix incorrect monitor the arrangement of documents ( #1432 )
2023-04-17 11:12:19 +08:00
Angerszhuuuu
ecafbf41fc
[CELEBORN-516][FOLLOWUP] Remove removed RPC metrics in metric doc ( #1431 )
2023-04-17 10:46:04 +08:00
cxzl25
13f772e0c0
[CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size
2023-04-14 20:45:25 +08:00
Cheng Pan
fb7b311c89
[CELEBORN-499] Move version specific resource to main repo ( #1429 )
...
* [CELEBORN-499] Move version specific resource to main repo
* license
2023-04-14 16:20:51 +08:00
Ethan Feng
9cccfc9872
[CELEBORN-431][FLINK] Support dynamic buffer allocation in reading map partition. ( #1407 )
2023-04-13 10:37:47 +08:00
Angerszhuuuu
e5722126e9
[CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication ( #1408 )
...
* [CELEBORN-502][REFACTOR] Merge GetBlacklistResponse to HeartbeatFromApplication
2023-04-12 14:59:32 +08:00
Angerszhuuuu
cad2836e85
[CELEBORN-505] Fix typo of SHUFFLE_CHUCK_SIZE ( #1411 )
2023-04-04 19:15:30 +08:00
Keyong Zhou
2e1598c011
[CELEBORN-485] Make celeborn.push.replicate.enabled default to false ( #1394 )
2023-04-03 16:36:29 +08:00
Angerszhuuuu
bf46336d54
[CELEBORN-487][PERF] ShuffleClientSide support blacklist to avoid client side timeout in same worker multiple times ( #1399 )
2023-04-03 11:50:04 +08:00
zhongqiangchen
cd92c423cd
[CELEBORN-475] Support extra tags for prometheus metrics ( #1385 )
...
[CELEBORN-475] Support extra tags for prometheus metrics
2023-03-28 21:22:28 +08:00
Keyong Zhou
cb19ed1c66
[CELEBORN-479][PERF] Refactor DataPushQueue.takePushTask to avoid busy wait ( #1386 )
2023-03-27 16:18:55 +08:00
Shuang
89b3f3887d
[CELEBORN-356] [FLINK] Support release single partition resource ( #1314 )
2023-03-24 17:15:28 +08:00
Ethan Feng
0ebad677d7
[CELEBORN-434] Add constrain about memory manager's parameters. ( #1356 )
2023-03-17 15:14:03 +08:00
Angerszhuuuu
4b334df7a6
[CELEBORN-399] Make fileSorterExecutors thread num can be customized ( #1325 )
2023-03-10 21:10:43 +08:00
Keyong Zhou
dcedf7b0a9
[CELEBORN-348] Support fetchTime in load-aware slots assignment strategy ( #1287 )
2023-03-02 18:31:50 +08:00
zhongqiangchen
cb76c4de4c
[CELEBORN-350][FLINK] Add PluginConf to be compatible with old configurations
2023-02-28 20:36:11 +08:00
Keyong Zhou
7adf1fca41
[CELEBORN-295] Optimize data push ( #1232 )
...
* [CELEBORN-295] Add double buffer for sort pusher
2023-02-28 10:35:55 +08:00
Ethan Feng
0c8bb83114
[CELEBORN-234] Implement buffer stream. ( #1221 )
2023-02-17 17:38:36 +08:00
Ethan Feng
3aacede5f8
[CELEBORN-283] Derive network layer for flink plugin. ( #1222 )
2023-02-17 14:12:54 +08:00
jiaoqingbo
3a92b0d911
[CELEBORN-284] fix typo in CelebornConf ( #1218 )
...
Co-authored-by: jiaoqb <jiaoqb@asiainfo.com>
2023-02-10 14:59:36 +08:00
Rex(Hui) An
bff6e91e0b
[CELEBORN-227] Support different push strategies to control the push speed ( #1167 )
2023-02-07 14:24:30 +08:00
Rex(Hui) An
bb113ec9be
[CELEBORN-207] Support network congestion control ( #1066 )
2023-02-07 12:06:18 +08:00
Angerszhuuuu
4b6f7e4593
[CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too ( #1185 )
2023-02-03 11:53:15 +08:00
Angerszhuuuu
04427f2b16
[CELEBORN-247] Add metrics for each user's quota usage ( #1182 )
2023-02-02 18:31:08 +08:00
Angerszhuuuu
122da47815
[CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout ( #1179 )
2023-01-30 11:57:07 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker ( #1102 )
2023-01-20 10:18:45 +08:00
Ethan Feng
a239f9f284
[CELEBORN-228]Refactor PartitionFileSorter to avoid specific JDK dependency. ( #1168 )
2023-01-16 20:06:47 +08:00
zy.jordan
bb96700415
[CELEBORN-223] The default rpc thread num of pushServer/replicateServer/fetchServer should be the number of total of Flusher's thread ( #1163 )
2023-01-16 12:03:46 +08:00
Keyong Zhou
fa7ba43136
[CELEBORN-225] Add global default configuration for number of flusher… ( #1165 )
2023-01-14 13:20:44 +08:00
zhongqiangczq
411ab09ffb
[CELEBORN-158][Flink] Add ShuffleServiceFactory to Support MapPartition in … ( #1105 )
2023-01-13 16:38:46 +08:00
Shuang
1332362bff
[CELEBORN-213] Add configuration for whether to close idle connections in client side ( #1157 )
2023-01-10 19:13:33 +08:00
zy.jordan
19197b9190
[CELEBORN-214] Push/Replicate/Fetch io threads default value is 16 ( #1158 )
2023-01-10 17:46:56 +08:00
Angerszhuuuu
e155ec122a
[CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice ( #1136 )
2023-01-10 11:39:40 +08:00
Angerszhuuuu
415452d9c4
[CELEBORN-189][IMPROVEMENT] PushDataFailedSlave should add slave worker to blacklist ( #1135 )
2023-01-05 20:12:07 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start ( #1119 )
2022-12-29 12:07:20 +08:00
Ethan Feng
5aa959a335
[CELEBORN-157] Change prefix of configurations to celeborn. ( #1104 )
2022-12-21 15:17:28 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata ( #1097 )
2022-12-20 20:40:42 +08:00
nafiy
c931663e5f
[CELEBORN-110][REFACTOR] Notify critical error after collecting a certain number of non-critical error ( #1055 )
2022-12-16 15:47:36 +08:00
nafiy
2e37830a0f
[CELEBORN-139][BUG] Fix read wrong yaml file format when loading config ( #1083 )
2022-12-14 20:56:04 +08:00
Angerszhuuuu
de3ef0d694
[CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout ( #1047 )
...
* [CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout
2022-12-06 14:41:23 +08:00
Ethan Feng
acfaf59ab3
[CELEBORN-91] Refactor memory tracker to support read buffer. ( #1038 )
...
* [CELEBORN-91] Refactor memory tracker to support read buffer.
2022-12-05 15:38:43 +08:00
nafiy
8e384cda5a
[CELEBORN-88][REFACTOR] Revive/PartitionSplit should set separated timeout configuration ( #1046 )
2022-12-05 10:36:43 +08:00
nafiy
44d45c2a27
[CELEBORN-90][REFACTOR] GetReducerFileGroup should support separated timeout configuration ( #1045 )
2022-12-02 22:53:51 +08:00
nafiy
13e1e24035
[CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration ( #1031 )
...
* [CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration
2022-12-01 18:39:56 +08:00
nafiy
d584211a75
[CELEBORN-95][REFACTOR]Rename CLIENT_RPC_ASK_TIMEOUT to HA_CLIENT_RPC_ASK_TIMEOUT ( #1037 )
2022-12-01 11:57:02 +08:00
zhongqiangczq
898d1126a6
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write: send handshake/regionstart/regionfinish ( #1035 )
2022-12-01 11:20:55 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end
2022-11-29 13:09:01 +08:00
Cheng Pan
9bf4c65357
[CELEBORN-72][DOCS] Remove unused website resources from main repo ( #1014 )
2022-11-28 09:47:30 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request ( #1011 )
2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk ( #1010 )
2022-11-26 18:06:06 +08:00
Ethan Feng
ee243f286d
[CELEBORN-4] Add metrics about top disk used apps. ( #985 )
2022-11-22 20:06:36 +08:00
Gabriel
5ecb09d62a
[ISSUE-911] Decrease numConnectionsPerPeer to achieve better performance ( #983 )
2022-11-20 11:46:17 +08:00
leesf
3699683a3b
Fix and migrate some configs ( #927 )
2022-11-07 09:41:38 +08:00
Kerwin Zhang
db08d49032
[FEATURE] Support columnar shuffle codegen ( #915 )
2022-11-04 20:54:13 +08:00
Angerszhuuuu
38e15d89e6
[ISSUE-902][IMPROVEMENT][FOLLOWUP] LifecycleManager should reserve blacklist with irrecoverable status ( #914 )
2022-11-04 15:54:45 +08:00
Angerszhuuuu
87fcfa767f
[ISSUE-887][REFACTOR] Configuration type convert to Enum ( #888 )
...
* [ISSUE-332][FOLLOWUP] Add deps in worker's pom
* [Refactor] Modify package name of utils to keep consistence
* [Refactor] Modify package name of utils to keep consistence
* [REFACTOR] Remove unused isRegistered in controller
* [ISSUE-887][REFACTOR] Configuration type convert to Enum
* update
* update
* Update RssShuffleManager.java
2022-10-29 13:41:06 +08:00
Cheng Pan
d7be6006e7
Migrate network related conf to structured conf system ( #875 )
...
* Migrate network related conf to structured conf system
* migrate
* fix
* fix
* worker
* fix
* nit
* review
* nit
2022-10-28 10:45:52 +08:00
Angerszhuuuu
d283cca4e1
[ISSUE-869][REFACTOR] Migrate partition size/sorter related conf to Celeborn ConfigEntity ( #870 )
2022-10-27 16:49:55 +08:00
Angerszhuuuu
26dcc118c6
[ISSUE-871][REFACTOR] Migrate Worker conf to Celeborn Configuration System ( #873 )
...
* [ISSUE-871][REFACTOR] Migrate Worker conf to Celeborn Configuration System
2022-10-27 15:35:29 +08:00
Angerszhuuuu
399236c880
[ISSUE-849][REFACTOR] Migrate master and common Celeborn Configuration System ( #850 )
2022-10-26 17:09:27 +08:00
Angerszhuuuu
89c3013122
[ISSUE-851][REFACTOR] Migrate quota configruation to Celeborn Configuration System ( #852 )
...
* [ISSUE-851][REFACTOR] Migrate quota configruation to Celeborn Configuration System
2022-10-26 14:09:44 +08:00
nafiy
e44e8c9610
[ISSUE-828][REFACTOR] Migrate memory tracker related configs to ConfigEntry ( #831 )
...
* [ISSUE-828][REFACTOR] Migrate memory tracker related configs to ConfigEntry
* Fix based on review
* update doc
* resolve review feedback
* fix
* Fix based on review
* fix based on review
2022-10-25 21:16:53 +08:00
Ethan Feng
8800fc4a8e
[Refactor] Refine rpc cache configs ( #853 )
...
* refine rpc cache configs.
* update.
* update.
* update.
2022-10-25 20:28:18 +08:00
Ethan Feng
45ef716737
[Feature] Cache GetReducerFileGroupResponse to avoid lifecycle manager oom. ( #792 )
2022-10-25 16:16:44 +08:00
Cheng Pan
e71c0228aa
Migrate columnar shuffle configurations to ConfigEntry ( #844 )
2022-10-25 14:26:11 +08:00
AngersZhuuuu
2ebf873b3c
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System ( #846 )
...
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System
2022-10-25 10:54:45 +08:00
AngersZhuuuu
0bd0a3e9f4
[ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System ( #848 )
...
* [ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System
* Update CelebornConf.scala
* follow comments
* update
* update
* update
* Update client.md
2022-10-25 09:16:46 +08:00
Cheng Pan
e3d649fff3
Change slot to slots for consistency ( #843 )
2022-10-24 20:49:28 +08:00
AngersZhuuuu
0fdb19065a
[ISSUE-841][REFACTOR] Migrate shuffle client side conf to Celeborn Configuration System ( #842 )
2022-10-24 20:48:48 +08:00
Cheng Pan
8d7d397e71
Fix Configuration page and polish naming ( #838 )
...
* Fix Configuration page and polish naming
* nit
* nit
* comment
2022-10-24 12:46:25 +08:00
Ethan Feng
392a252baa
[FOLLOWUP][ISSUE-813]Update doc and fix typo. ( #825 )
2022-10-22 23:02:22 +08:00
nafiy
1a8a36e8fe
[ISSUE-812][Refactor] Migrate metrics system related configs to ConfigEntry ( #821 )
2022-10-21 13:57:58 +08:00
Ethan Feng
5c761a8df3
[ISSUE-813][Refactor] Refactor flusher configurations. ( #813 )
...
* Refactor flusher configurations.
* Refactor flusher configurations.
* Update.
* remove brackets.
* update docs.
* rename.
* update.
* update docs.
* update.
* update.
* update.
* update.
* update.
* update.
* update.
* format.
* update.
* update.
2022-10-20 15:23:17 +08:00
AngersZhuuuu
23c65a27a9
[ISSUE-798][REFACTOR] Migrate worker-recover related conf to ConfigEntry ( #799 )
2022-10-19 16:42:00 +08:00