Commit Graph

1099 Commits

Author SHA1 Message Date
Shuang
e2196e9383
[CELEBORN-56] [ISSUE-945] handle map partition mapper end (#1003) 2022-12-07 21:09:02 +08:00
Shuang
f3f104870c
[CELEBORN-75] Initialize flink plugin module (#1027) 2022-12-07 15:53:00 +08:00
Angerszhuuuu
0d38bad78a
[CELEBORN-20][REFACTOR] Extract CommitManager from LifecycleManager (#1050) 2022-12-06 22:26:18 +08:00
Angerszhuuuu
de3ef0d694
[CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout (#1047)
* [CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout
2022-12-06 14:41:23 +08:00
Angerszhuuuu
1e4dec96b9
[CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager (#1024)
* [CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager
2022-12-05 17:05:17 +08:00
Ethan Feng
acfaf59ab3
[CELEBORN-91] Refactor memory tracker to support read buffer. (#1038)
* [CELEBORN-91] Refactor memory tracker to support read buffer.
2022-12-05 15:38:43 +08:00
Angerszhuuuu
5eaad136a0
[CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason (#1043)
* [CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason
2022-12-05 14:02:33 +08:00
zhongqiangczq
b262591da8
[CELEBORN-71] pushdatahandler supports mappartition write: handshake/regionstart/regionfinish (#1013) 2022-12-05 13:05:35 +08:00
nafiy
8e384cda5a
[CELEBORN-88][REFACTOR] Revive/PartitionSplit should set separated timeout configuration (#1046) 2022-12-05 10:36:43 +08:00
nafiy
44d45c2a27
[CELEBORN-90][REFACTOR] GetReducerFileGroup should support separated timeout configuration (#1045) 2022-12-02 22:53:51 +08:00
Binjie Yang
d6ee3c18bc
[CELEBORN-98][IMPROVEMENT] Remove unreachable code block in master/work arguments (#1042) 2022-12-02 22:53:28 +08:00
Ethan Feng
d65650f764
[CELEBORN-97] Correct notification mailing list. (#1040) 2022-12-02 16:42:56 +08:00
Shuang
3a4c3c03a0
[CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem (#1034)
* [CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem
2022-12-02 16:25:59 +08:00
Angerszhuuuu
fc5ca42c14
[CELEBORN-96][REFACTOR] PushMergedData return partition not found use same code path (#1039) 2022-12-02 14:09:00 +08:00
nafiy
13e1e24035
[CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration (#1031)
* [CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration
2022-12-01 18:39:56 +08:00
Angerszhuuuu
017b3d2b41
[CELEBORN-94][BUG] StateMachine should implement pause to change status (#1033) 2022-12-01 12:16:44 +08:00
nafiy
d584211a75
[CELEBORN-95][REFACTOR]Rename CLIENT_RPC_ASK_TIMEOUT to HA_CLIENT_RPC_ASK_TIMEOUT (#1037) 2022-12-01 11:57:02 +08:00
zhongqiangczq
898d1126a6
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write: send handshake/regionstart/regionfinish (#1035) 2022-12-01 11:20:55 +08:00
RexAn
bb5a4d2180
[CELEBORN-63] Add CONGESTION related status codes (#1028)
* Increase push data return reason types such as CONGESTION ect
2022-12-01 10:55:37 +08:00
Binjie Yang
a31dcc8194
[IMPROVEMENT] Improve celeborn script logic (#1020) 2022-11-30 20:03:56 +08:00
Angerszhuuuu
5b9102d792
[CELEBORN-93][BUG] Rss Raft reject install snapshot (#1032) 2022-11-30 19:43:47 +08:00
Angerszhuuuu
7f8e66afbc
CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end (#1030)
* CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end
2022-11-30 19:42:04 +08:00
Ethan Feng
dd02070e4b
[CELEBORN-83] Fix various bug when using HDFS as storage.
1. fix incompatibility between Hadoop 2 and Hadoop 3.
2. fix hdfs writer will never be called when there are no healthy disks.
3. fix an NPE when HDFS file writer close.
2022-11-30 19:33:18 +08:00
Angerszhuuuu
5ad4415c68
[CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager (#1021)
* [CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager
2022-11-29 19:14:55 +08:00
Ethan Feng
02e446284d
[CELEBORN-74] Device monitor should respect storage dir configured usable space (#1023) 2022-11-29 17:10:18 +08:00
Angerszhuuuu
01dc9d4259
[CELEBORN-79][REFACTOR] Remove unused responseCheckerThread from LifecycleManager (#1022) 2022-11-29 15:25:37 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end 2022-11-29 13:09:01 +08:00
Angerszhuuuu
c8e5315b9c
[CELEBORN-23][FOLLOWUP] Both master and slave data should return HARD_SPLIT during shutdown (#1018) 2022-11-28 22:05:07 +08:00
Angerszhuuuu
13f4ce2be6
[CELEBORN-68][FOLLOWUP] Retry on same partition location should have a retry wait interval (#1017) 2022-11-28 20:17:08 +08:00
Keyong Zhou
61e04b77fd
[CELEBORN-70][FOLLOWUP] Add epoch for each commitFiles request. (#1015)
* [CELEBORN-70][FOLLOWUP] Add epoch for each commitFiles request. Address comments.
2022-11-28 14:08:20 +08:00
Ethan Feng
cfa9b7f700
[CELEBORN-18] Refactor stream manager to distinguish map partition and reduce partition. (#997) 2022-11-28 12:02:38 +08:00
Cheng Pan
9bf4c65357
[CELEBORN-72][DOCS] Remove unused website resources from main repo (#1014) 2022-11-28 09:47:30 +08:00
Keyong Zhou
d381df71f8
[CELEBORN-70] Add epoch for each commitFiles request (#1012) 2022-11-27 21:05:14 +08:00
nafiy
817eee969f
[CELEBORN-58][REFACTOR] Aggregate reserve failed logs together (#1005) 2022-11-26 20:56:39 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request (#1011) 2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk (#1010) 2022-11-26 18:06:06 +08:00
Keyong Zhou
04e86062f0
[CELEBORN-69] Fullyread check in FileManagedBuffers is not accurate (#1008) 2022-11-26 15:04:29 +08:00
Ethan Feng
93dbf3f8b1
[CELEBORN-67] Revert "Fix fetch incorrect data chunk" related commits (#1006)
* Revert "[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)"

This reverts commit 1e8f6dc5e8.

* Revert "[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000)"

This reverts commit f1c4d675d6.

* Revert "[CELEBORN-49] Deadlock when kill worker in shuffle read (#998)"

This reverts commit 0be4b3399c.

* Revert "[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995)"

This reverts commit 2b05228871.

* Revert "[BUG] Fix fetch incorrect data chunk (#926)"

This reverts commit 6f043f8a

* Revert "[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954)"

This reverts commit 64e8ebf1
2022-11-25 20:57:47 +08:00
nafiy
fe13e9e261
[CELEBORN-59][REFACTOR] Support send destroy slots request in parallel (#1004) 2022-11-25 18:26:05 +08:00
Angerszhuuuu
1e8f6dc5e8
[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)
* [CELEBORN-48][BUG] Channel inactive may cause new client use old stream id to fetch data
2022-11-23 18:22:06 +08:00
Ethan Feng
f1c4d675d6
[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000) 2022-11-23 18:07:57 +08:00
Keyong Zhou
0be4b3399c
[CELEBORN-49] Deadlock when kill worker in shuffle read (#998) 2022-11-23 17:31:05 +08:00
William Song
735ba4ce0c
[CELEBORN-44][BUG] StateMachine not update currentSnapshot after takeSnapshot cause getLatestSnapshot return null (#996) 2022-11-23 16:00:14 +08:00
Angerszhuuuu
2b05228871
[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995) 2022-11-23 11:56:10 +08:00
Keyong Zhou
cfc1fa15bd
[CELEBORN-46] Refine log for RssInputStream.close() (#994) 2022-11-22 22:01:08 +08:00
Ethan Feng
ee243f286d
[CELEBORN-4] Add metrics about top disk used apps. (#985) 2022-11-22 20:06:36 +08:00
Angerszhuuuu
e12000cb67
[CELEBORN-42][BUG] PushMergedData use wrong call back when partition not found (#991) 2022-11-22 18:29:15 +08:00
Ethan Feng
20c00fd8eb
[CELEBORN-5] Update contributing guide. (#986) 2022-11-22 15:25:59 +08:00
Shuang
1656458788
[CELEBORN-14] [ISSUE-955] support register attempt map task (#984) 2022-11-22 15:23:20 +08:00
Angerszhuuuu
5ec278f99a
[ISSUE-987][FEATURE] During worker shutdown, return HARD_SPLIT for all existed partition (#988) 2022-11-22 14:29:55 +08:00