Commit Graph

163 Commits

Author SHA1 Message Date
Angerszhuuuu
122da47815
[CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout (#1179) 2023-01-30 11:57:07 +08:00
zy.jordan
c5be79ee3d
[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker (#1102) 2023-01-20 10:18:45 +08:00
Ethan Feng
a239f9f284
[CELEBORN-228]Refactor PartitionFileSorter to avoid specific JDK dependency. (#1168) 2023-01-16 20:06:47 +08:00
zy.jordan
bb96700415
[CELEBORN-223] The default rpc thread num of pushServer/replicateServer/fetchServer should be the number of total of Flusher's thread (#1163) 2023-01-16 12:03:46 +08:00
Keyong Zhou
fa7ba43136
[CELEBORN-225] Add global default configuration for number of flusher… (#1165) 2023-01-14 13:20:44 +08:00
zhongqiangczq
411ab09ffb
[CELEBORN-158][Flink] Add ShuffleServiceFactory to Support MapPartition in … (#1105) 2023-01-13 16:38:46 +08:00
Shuang
1332362bff
[CELEBORN-213] Add configuration for whether to close idle connections in client side (#1157) 2023-01-10 19:13:33 +08:00
zy.jordan
19197b9190
[CELEBORN-214] Push/Replicate/Fetch io threads default value is 16 (#1158) 2023-01-10 17:46:56 +08:00
Angerszhuuuu
e155ec122a
[CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice (#1136) 2023-01-10 11:39:40 +08:00
Angerszhuuuu
415452d9c4
[CELEBORN-189][IMPROVEMENT] PushDataFailedSlave should add slave worker to blacklist (#1135) 2023-01-05 20:12:07 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start (#1119) 2022-12-29 12:07:20 +08:00
Ethan Feng
5aa959a335
[CELEBORN-157] Change prefix of configurations to celeborn. (#1104) 2022-12-21 15:17:28 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata (#1097) 2022-12-20 20:40:42 +08:00
nafiy
c931663e5f
[CELEBORN-110][REFACTOR] Notify critical error after collecting a certain number of non-critical error (#1055) 2022-12-16 15:47:36 +08:00
nafiy
2e37830a0f
[CELEBORN-139][BUG] Fix read wrong yaml file format when loading config (#1083) 2022-12-14 20:56:04 +08:00
Angerszhuuuu
de3ef0d694
[CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout (#1047)
* [CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout
2022-12-06 14:41:23 +08:00
Ethan Feng
acfaf59ab3
[CELEBORN-91] Refactor memory tracker to support read buffer. (#1038)
* [CELEBORN-91] Refactor memory tracker to support read buffer.
2022-12-05 15:38:43 +08:00
nafiy
8e384cda5a
[CELEBORN-88][REFACTOR] Revive/PartitionSplit should set separated timeout configuration (#1046) 2022-12-05 10:36:43 +08:00
nafiy
44d45c2a27
[CELEBORN-90][REFACTOR] GetReducerFileGroup should support separated timeout configuration (#1045) 2022-12-02 22:53:51 +08:00
nafiy
13e1e24035
[CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration (#1031)
* [CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration
2022-12-01 18:39:56 +08:00
nafiy
d584211a75
[CELEBORN-95][REFACTOR]Rename CLIENT_RPC_ASK_TIMEOUT to HA_CLIENT_RPC_ASK_TIMEOUT (#1037) 2022-12-01 11:57:02 +08:00
zhongqiangczq
898d1126a6
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write: send handshake/regionstart/regionfinish (#1035) 2022-12-01 11:20:55 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end 2022-11-29 13:09:01 +08:00
Cheng Pan
9bf4c65357
[CELEBORN-72][DOCS] Remove unused website resources from main repo (#1014) 2022-11-28 09:47:30 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request (#1011) 2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk (#1010) 2022-11-26 18:06:06 +08:00
Ethan Feng
ee243f286d
[CELEBORN-4] Add metrics about top disk used apps. (#985) 2022-11-22 20:06:36 +08:00
Gabriel
5ecb09d62a
[ISSUE-911] Decrease numConnectionsPerPeer to achieve better performance (#983) 2022-11-20 11:46:17 +08:00
leesf
3699683a3b
Fix and migrate some configs (#927) 2022-11-07 09:41:38 +08:00
Kerwin Zhang
db08d49032
[FEATURE] Support columnar shuffle codegen (#915) 2022-11-04 20:54:13 +08:00
Angerszhuuuu
38e15d89e6
[ISSUE-902][IMPROVEMENT][FOLLOWUP] LifecycleManager should reserve blacklist with irrecoverable status (#914) 2022-11-04 15:54:45 +08:00
Angerszhuuuu
87fcfa767f
[ISSUE-887][REFACTOR] Configuration type convert to Enum (#888)
* [ISSUE-332][FOLLOWUP] Add deps in worker's pom

* [Refactor] Modify package name of utils to keep consistence

* [Refactor] Modify package name of utils to keep consistence

* [REFACTOR] Remove unused isRegistered in controller

* [ISSUE-887][REFACTOR] Configuration type convert to Enum

* update

* update

* Update RssShuffleManager.java
2022-10-29 13:41:06 +08:00
Cheng Pan
d7be6006e7
Migrate network related conf to structured conf system (#875)
* Migrate network related conf to structured conf system

* migrate

* fix

* fix

* worker

* fix

* nit

* review

* nit
2022-10-28 10:45:52 +08:00
Angerszhuuuu
d283cca4e1
[ISSUE-869][REFACTOR] Migrate partition size/sorter related conf to Celeborn ConfigEntity (#870) 2022-10-27 16:49:55 +08:00
Angerszhuuuu
26dcc118c6
[ISSUE-871][REFACTOR] Migrate Worker conf to Celeborn Configuration System (#873)
* [ISSUE-871][REFACTOR] Migrate Worker conf to Celeborn Configuration System
2022-10-27 15:35:29 +08:00
Angerszhuuuu
399236c880
[ISSUE-849][REFACTOR] Migrate master and common Celeborn Configuration System (#850) 2022-10-26 17:09:27 +08:00
Angerszhuuuu
89c3013122
[ISSUE-851][REFACTOR] Migrate quota configruation to Celeborn Configuration System (#852)
* [ISSUE-851][REFACTOR] Migrate quota configruation to Celeborn Configuration System
2022-10-26 14:09:44 +08:00
nafiy
e44e8c9610
[ISSUE-828][REFACTOR] Migrate memory tracker related configs to ConfigEntry (#831)
* [ISSUE-828][REFACTOR] Migrate memory tracker related configs to ConfigEntry

* Fix based on review

* update doc

* resolve review feedback

* fix

* Fix based on review

* fix based on review
2022-10-25 21:16:53 +08:00
Ethan Feng
8800fc4a8e
[Refactor] Refine rpc cache configs (#853)
* refine rpc cache configs.

* update.

* update.

* update.
2022-10-25 20:28:18 +08:00
Ethan Feng
45ef716737
[Feature] Cache GetReducerFileGroupResponse to avoid lifecycle manager oom. (#792) 2022-10-25 16:16:44 +08:00
Cheng Pan
e71c0228aa
Migrate columnar shuffle configurations to ConfigEntry (#844) 2022-10-25 14:26:11 +08:00
AngersZhuuuu
2ebf873b3c
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System (#846)
[ISSUE-845][REFACTOR] Migrate partition split related conf to Celeborn Configuration System
2022-10-25 10:54:45 +08:00
AngersZhuuuu
0bd0a3e9f4
[ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System (#848)
* [ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System

* Update CelebornConf.scala

* follow comments

* update

* update

* update

* Update client.md
2022-10-25 09:16:46 +08:00
Cheng Pan
e3d649fff3
Change slot to slots for consistency (#843) 2022-10-24 20:49:28 +08:00
AngersZhuuuu
0fdb19065a
[ISSUE-841][REFACTOR] Migrate shuffle client side conf to Celeborn Configuration System (#842) 2022-10-24 20:48:48 +08:00
Cheng Pan
8d7d397e71
Fix Configuration page and polish naming (#838)
* Fix Configuration page and polish naming

* nit

* nit

* comment
2022-10-24 12:46:25 +08:00
Ethan Feng
392a252baa
[FOLLOWUP][ISSUE-813]Update doc and fix typo. (#825) 2022-10-22 23:02:22 +08:00
nafiy
1a8a36e8fe
[ISSUE-812][Refactor] Migrate metrics system related configs to ConfigEntry (#821) 2022-10-21 13:57:58 +08:00
Ethan Feng
5c761a8df3
[ISSUE-813][Refactor] Refactor flusher configurations. (#813)
* Refactor flusher configurations.

* Refactor flusher configurations.

* Update.

* remove brackets.

* update docs.

* rename.

* update.

* update docs.

* update.

* update.

* update.

* update.

* update.

* update.

* update.

* format.

* update.

* update.
2022-10-20 15:23:17 +08:00
AngersZhuuuu
23c65a27a9
[ISSUE-798][REFACTOR] Migrate worker-recover related conf to ConfigEntry (#799) 2022-10-19 16:42:00 +08:00
Cheng Pan
cb07cf62c0
Auto generate configuration docs (#794) 2022-10-19 10:50:22 +08:00
Cheng Pan
ea67f4e060
Introduce categories to ConfigEntry and migrate configurations (#775) 2022-10-17 16:56:54 +08:00
Cheng Pan
f01a696313
Migrate and refactor configuration for master endpoints (#752) 2022-10-11 21:33:21 +08:00
AngersZhuuuu
bbb4f8e225
[ISSUE-306][IMPROVEMENT] Handle change partition request in batch (#622) 2022-10-10 18:31:37 +08:00
AngersZhuuuu
db9ce36608
[ISSUE-690][DOC] Storage resource quota doc (#703) 2022-10-09 20:01:50 +08:00
Keyong Zhou
a2d2379153
[DOC] Replace RSS with Celeborn in docs (#715) 2022-10-06 10:37:46 +08:00
Kerwin Zhang
5da5ea950a
[DOC] Update columnar shuffle config (#713) 2022-10-06 08:48:35 +08:00
Keyong Zhou
fe3b5988f2
[REFACTOR] Change package name to org.apache.celeborn (#710) 2022-10-02 18:10:29 +08:00
Kerwin Zhang
7937124fc9
[ISSUE-670] [FEATURE] Add configuration for columnar shuffle (#671) 2022-09-24 17:56:21 +08:00
Ethan Feng
30d4323cdb
[FEATURE] Add a configuration to enable a map id filter mechanism. #662 (#663) 2022-09-23 18:38:52 +08:00
AngersZhuuuu
df5ba55ea5
[ISSUE-633][FEATURE] Support provider user identity by customized class and keep LifecycleManager and ShuffleClient user identity consistence (#646) 2022-09-21 17:35:59 +08:00
nafiy
c4f40eed90
[ISSUE-610][BUG] PartitionFilesSorter sorting pendingBuffer not enough cause job failed (#616) 2022-09-19 19:43:04 +08:00
Cheng Pan
f2ca6d68e4
[DOCS] Build website (#579) 2022-09-10 00:45:13 +08:00