Commit Graph

71 Commits

Author SHA1 Message Date
Cheng Pan
ec371c0026
[CELEBORN-132] ShuffleClient should not implement Cloneable (#1077) 2022-12-14 10:04:39 +08:00
Keyong Zhou
5a3d397781
[CELEBORN-130] Correct CommitFilesTime metric (#1073) 2022-12-13 20:02:21 +08:00
zhongqiangczq
97991a3404
[CELEBORN-126] Fileinfo adds member bufferSize (#1068) 2022-12-13 16:36:26 +08:00
zhongqiangczq
edf85de8f6
[CELEBORN-123] PushDataHandler handleRpcRequestCore fix bug about val isMaster (#1063) 2022-12-12 15:51:58 +08:00
zhongqiangczq
c7258cfc03
[CELEBORN-103] add handleMapPartitionPushData to support mappartition (#1048) 2022-12-08 11:22:43 +08:00
zhongqiangczq
ea1c630173
[CELEBORN-80] FileWriter supports MapPartition (#1025) 2022-12-08 10:46:26 +08:00
Ethan Feng
acfaf59ab3
[CELEBORN-91] Refactor memory tracker to support read buffer. (#1038)
* [CELEBORN-91] Refactor memory tracker to support read buffer.
2022-12-05 15:38:43 +08:00
zhongqiangczq
b262591da8
[CELEBORN-71] pushdatahandler supports mappartition write: handshake/regionstart/regionfinish (#1013) 2022-12-05 13:05:35 +08:00
Binjie Yang
d6ee3c18bc
[CELEBORN-98][IMPROVEMENT] Remove unreachable code block in master/work arguments (#1042) 2022-12-02 22:53:28 +08:00
Angerszhuuuu
fc5ca42c14
[CELEBORN-96][REFACTOR] PushMergedData return partition not found use same code path (#1039) 2022-12-02 14:09:00 +08:00
Ethan Feng
dd02070e4b
[CELEBORN-83] Fix various bug when using HDFS as storage.
1. fix incompatibility between Hadoop 2 and Hadoop 3.
2. fix hdfs writer will never be called when there are no healthy disks.
3. fix an NPE when HDFS file writer close.
2022-11-30 19:33:18 +08:00
Ethan Feng
02e446284d
[CELEBORN-74] Device monitor should respect storage dir configured usable space (#1023) 2022-11-29 17:10:18 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end 2022-11-29 13:09:01 +08:00
Angerszhuuuu
c8e5315b9c
[CELEBORN-23][FOLLOWUP] Both master and slave data should return HARD_SPLIT during shutdown (#1018) 2022-11-28 22:05:07 +08:00
Keyong Zhou
61e04b77fd
[CELEBORN-70][FOLLOWUP] Add epoch for each commitFiles request. (#1015)
* [CELEBORN-70][FOLLOWUP] Add epoch for each commitFiles request. Address comments.
2022-11-28 14:08:20 +08:00
Ethan Feng
cfa9b7f700
[CELEBORN-18] Refactor stream manager to distinguish map partition and reduce partition. (#997) 2022-11-28 12:02:38 +08:00
Keyong Zhou
d381df71f8
[CELEBORN-70] Add epoch for each commitFiles request (#1012) 2022-11-27 21:05:14 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request (#1011) 2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk (#1010) 2022-11-26 18:06:06 +08:00
Ethan Feng
93dbf3f8b1
[CELEBORN-67] Revert "Fix fetch incorrect data chunk" related commits (#1006)
* Revert "[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)"

This reverts commit 1e8f6dc5e8.

* Revert "[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000)"

This reverts commit f1c4d675d6.

* Revert "[CELEBORN-49] Deadlock when kill worker in shuffle read (#998)"

This reverts commit 0be4b3399c.

* Revert "[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995)"

This reverts commit 2b05228871.

* Revert "[BUG] Fix fetch incorrect data chunk (#926)"

This reverts commit 6f043f8a

* Revert "[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954)"

This reverts commit 64e8ebf1
2022-11-25 20:57:47 +08:00
Angerszhuuuu
2b05228871
[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995) 2022-11-23 11:56:10 +08:00
Ethan Feng
ee243f286d
[CELEBORN-4] Add metrics about top disk used apps. (#985) 2022-11-22 20:06:36 +08:00
Angerszhuuuu
e12000cb67
[CELEBORN-42][BUG] PushMergedData use wrong call back when partition not found (#991) 2022-11-22 18:29:15 +08:00
Angerszhuuuu
5ec278f99a
[ISSUE-987][FEATURE] During worker shutdown, return HARD_SPLIT for all existed partition (#988) 2022-11-22 14:29:55 +08:00
zhongqiangczq
7adcb5b933
[CELEBORN-6] [REFACTOR] PushDataHandler code refactor (#966) 2022-11-16 11:04:24 +08:00
leesf
0b8376e2c7
Cleanup some code (#943) 2022-11-11 13:58:39 +08:00
Ethan Feng
6f043f8ae9
[BUG] Fix fetch incorrect data chunk (#926) 2022-11-09 22:31:39 +08:00
leesf
aac68c3571
Rename RssException to CelebornException (#938) 2022-11-08 10:08:21 +08:00
leesf
496f44eda4
Shutdown worker if initialized failed. (#931) 2022-11-07 19:33:35 +08:00
Angerszhuuuu
99a7b85708
[ISSUE-932][REFACTOR] Device check should not directly reportError (#933)
* [ISSUE-932][REFACTOR] Device check should not directly reportError
2022-11-07 15:15:08 +08:00
nafiy
11081eac6c
[ISSUE-879][BUG] When notifyError, should destroy corresponding file writers (#912)
* [ISSUE-879][BUG] When notifyError, should destroy corresponding file writers
2022-11-07 14:01:51 +08:00
Angerszhuuuu
100e0057e8
[ISSUE-921][BUG] Flush Error should report non critical error (#928) 2022-11-07 11:56:11 +08:00
leesf
3699683a3b
Fix and migrate some configs (#927) 2022-11-07 09:41:38 +08:00
Angerszhuuuu
38e15d89e6
[ISSUE-902][IMPROVEMENT][FOLLOWUP] LifecycleManager should reserve blacklist with irrecoverable status (#914) 2022-11-04 15:54:45 +08:00
Angerszhuuuu
ea4ed10e5c
[ISSUE-901][BUG] During worker graceful shutdown, worker should report itself as unavailable and avoid master allocate slots on it. (#905) 2022-11-02 16:09:58 +08:00
Zhen Wang
643eb84541
[MINOR] Fix typo (#898) 2022-11-01 10:03:15 +08:00
nafiy
ce3dc889fa
[ISSUE-867][BUG] Create writer failed should report non-critical error instead of critical error (#883) 2022-10-31 21:23:16 +08:00
nafiy
9b1c70f219
[ISSUE-880][BUG] onTrim when flushFileWriters() should catch each file writer's exception, avoid block flush all file writers (#894) 2022-10-31 14:31:22 +08:00
Angerszhuuuu
87fcfa767f
[ISSUE-887][REFACTOR] Configuration type convert to Enum (#888)
* [ISSUE-332][FOLLOWUP] Add deps in worker's pom

* [Refactor] Modify package name of utils to keep consistence

* [Refactor] Modify package name of utils to keep consistence

* [REFACTOR] Remove unused isRegistered in controller

* [ISSUE-887][REFACTOR] Configuration type convert to Enum

* update

* update

* Update RssShuffleManager.java
2022-10-29 13:41:06 +08:00
Cheng Pan
d7be6006e7
Migrate network related conf to structured conf system (#875)
* Migrate network related conf to structured conf system

* migrate

* fix

* fix

* worker

* fix

* nit

* review

* nit
2022-10-28 10:45:52 +08:00
Angerszhuuuu
d283cca4e1
[ISSUE-869][REFACTOR] Migrate partition size/sorter related conf to Celeborn ConfigEntity (#870) 2022-10-27 16:49:55 +08:00
Angerszhuuuu
26dcc118c6
[ISSUE-871][REFACTOR] Migrate Worker conf to Celeborn Configuration System (#873)
* [ISSUE-871][REFACTOR] Migrate Worker conf to Celeborn Configuration System
2022-10-27 15:35:29 +08:00
Angerszhuuuu
5333819cb0
[ISSUE-866][BUG] Create File twice should show clear log (#876) 2022-10-27 14:52:45 +08:00
nafiy
e44e8c9610
[ISSUE-828][REFACTOR] Migrate memory tracker related configs to ConfigEntry (#831)
* [ISSUE-828][REFACTOR] Migrate memory tracker related configs to ConfigEntry

* Fix based on review

* update doc

* resolve review feedback

* fix

* Fix based on review

* fix based on review
2022-10-25 21:16:53 +08:00
AngersZhuuuu
0bd0a3e9f4
[ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System (#848)
* [ISSUE-847][REFACTOR] Migrate codec conf to Celeborn Configuration System

* Update CelebornConf.scala

* follow comments

* update

* update

* update

* Update client.md
2022-10-25 09:16:46 +08:00
Ethan Feng
4df0d4a456
[TEST] Fix unstable LZ4 unit test (#816) 2022-10-24 15:36:06 +08:00
Cheng Pan
8d7d397e71
Fix Configuration page and polish naming (#838)
* Fix Configuration page and polish naming

* nit

* nit

* comment
2022-10-24 12:46:25 +08:00
Ethan Feng
74843f20a9
[BUG] Fix worker lost caused by UnsupportedOperationException (#837) 2022-10-24 11:20:42 +08:00
Keyong Zhou
63752e7a37
[BUG] RegisterShuffle should not increase epoch (#833) 2022-10-23 23:40:32 +08:00
Ethan Feng
392a252baa
[FOLLOWUP][ISSUE-813]Update doc and fix typo. (#825) 2022-10-22 23:02:22 +08:00