Commit Graph

178 Commits

Author SHA1 Message Date
Angerszhuuuu
fe8dfb05f3
[CELEBORN-196][REFACTOR] Rename batchHandleRequestPartitions to handleRequestPartitions (#1144) 2023-01-05 14:37:10 +08:00
Angerszhuuuu
2315f2f988
[CELEBORN-191][BUG] ShuffleClient registerShuffle return RESERVE_SLOTS_FAILED should also been print out (#1138) 2023-01-03 17:13:31 +08:00
Shuang
5cba307189
[CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup (#1116) 2022-12-30 18:15:23 +08:00
Cheng Pan
b8758a7cb6
[CELEBORN-181][TEST] Rename RssFunSuite to CelebornFunSuite (#1125) 2022-12-29 18:10:14 +08:00
RexAn
6432a129be
[CELEBORN-61][CELEBORN-62][FOLLOW_UP] Fix some issues for slow start (#1119) 2022-12-29 12:07:20 +08:00
Binjie Yang
63943cd5cc
[CELEBORN-147][IT]Extraction of common integration test cases (#1092) 2022-12-29 12:03:09 +08:00
Keyong Zhou
2f0682265e
[CELEBORN-119] Add timeout for pushdata (#1097) 2022-12-20 20:40:42 +08:00
Keyong Zhou
a2dd72f20c
[CELEBORN-155] Wrong TimeUnit for registerShuffleRetryWait in Shuffle… (#1099) 2022-12-19 17:32:18 +08:00
Shuang
13769f0f0a
[CELEBORN-121] Refactor batchHandleCommitPartition (#1089) 2022-12-19 12:39:39 +08:00
Ethan Feng
39394526a8
[CELEBORN-142]Keep committed partition locations semantic consistent when commit files on HDFS. (#1091) 2022-12-16 19:02:02 +08:00
nafiy
ddab27a1d7
[CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse (#1093)
* [CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse
2022-12-15 18:16:34 +08:00
Ethan Feng
65cb36c002
[CELEBORN-83][FOLLOWUP] Fix various bugs when using HDFS as storage. (#1065) 2022-12-15 15:20:29 +08:00
Shuang
e3576e4e7a
[CELEBORN-117] refactor CommitManager, implements M/R Partition Commi… (#1060) 2022-12-15 11:09:59 +08:00
Cheng Pan
ec371c0026
[CELEBORN-132] ShuffleClient should not implement Cloneable (#1077) 2022-12-14 10:04:39 +08:00
Angerszhuuuu
c924a4ff0d
[CELEBORN-61][CELEBORN-62][FEATURE] Shuffle client support slow start, congestion avoidance and congestion control (#1052) 2022-12-08 12:41:34 +08:00
zhongqiangczq
60f6f87832
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write:pushdata (#1036) 2022-12-08 12:31:47 +08:00
zhongqiangczq
d3d40f730c
[CELEBORN-106] flink-plugin supports shufflewrite:OutputGate (#1051) 2022-12-08 11:24:37 +08:00
Shuang
e2196e9383
[CELEBORN-56] [ISSUE-945] handle map partition mapper end (#1003) 2022-12-07 21:09:02 +08:00
Shuang
f3f104870c
[CELEBORN-75] Initialize flink plugin module (#1027) 2022-12-07 15:53:00 +08:00
Angerszhuuuu
0d38bad78a
[CELEBORN-20][REFACTOR] Extract CommitManager from LifecycleManager (#1050) 2022-12-06 22:26:18 +08:00
Angerszhuuuu
1e4dec96b9
[CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager (#1024)
* [CELEBORN-21][REFACTOR] Extract revive related logical from LifecycleManager
2022-12-05 17:05:17 +08:00
Angerszhuuuu
5eaad136a0
[CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason (#1043)
* [CELEBORN-84][IMPROVEMENT] Blacklist critical reason should avoid been covered by normal reason
2022-12-05 14:02:33 +08:00
nafiy
8e384cda5a
[CELEBORN-88][REFACTOR] Revive/PartitionSplit should set separated timeout configuration (#1046) 2022-12-05 10:36:43 +08:00
nafiy
44d45c2a27
[CELEBORN-90][REFACTOR] GetReducerFileGroup should support separated timeout configuration (#1045) 2022-12-02 22:53:51 +08:00
Shuang
3a4c3c03a0
[CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem (#1034)
* [CELEBORN-76][FOLLOWUP] fix inFlightCommitRequest counting problem
2022-12-02 16:25:59 +08:00
nafiy
13e1e24035
[CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration (#1031)
* [CELEBORN-86][REFATCOR] Register shuffle should have separated timeout configuration
2022-12-01 18:39:56 +08:00
zhongqiangczq
898d1126a6
[CELEBORN-11] ShuffleClient supports MapPartition shuffle write: send handshake/regionstart/regionfinish (#1035) 2022-12-01 11:20:55 +08:00
RexAn
bb5a4d2180
[CELEBORN-63] Add CONGESTION related status codes (#1028)
* Increase push data return reason types such as CONGESTION ect
2022-12-01 10:55:37 +08:00
Angerszhuuuu
7f8e66afbc
CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end (#1030)
* CELEBORN-76][FOLLOWUP] Support batch commit hard split partition before stage end
2022-11-30 19:42:04 +08:00
Ethan Feng
dd02070e4b
[CELEBORN-83] Fix various bug when using HDFS as storage.
1. fix incompatibility between Hadoop 2 and Hadoop 3.
2. fix hdfs writer will never be called when there are no healthy disks.
3. fix an NPE when HDFS file writer close.
2022-11-30 19:33:18 +08:00
Angerszhuuuu
5ad4415c68
[CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager (#1021)
* [CELEBORN-78][REFACTOR] Extract heartbearter from LifecycleManager
2022-11-29 19:14:55 +08:00
Angerszhuuuu
01dc9d4259
[CELEBORN-79][REFACTOR] Remove unused responseCheckerThread from LifecycleManager (#1022) 2022-11-29 15:25:37 +08:00
Angerszhuuuu
d26e73209b
[CELEBORN-76] Support batch commit hard split partition before stage end 2022-11-29 13:09:01 +08:00
Angerszhuuuu
13f4ce2be6
[CELEBORN-68][FOLLOWUP] Retry on same partition location should have a retry wait interval (#1017) 2022-11-28 20:17:08 +08:00
Keyong Zhou
d381df71f8
[CELEBORN-70] Add epoch for each commitFiles request (#1012) 2022-11-27 21:05:14 +08:00
nafiy
817eee969f
[CELEBORN-58][REFACTOR] Aggregate reserve failed logs together (#1005) 2022-11-26 20:56:39 +08:00
Keyong Zhou
f8bb2cd47d
[CELEBORN-12]Retry on CommitFile request (#1011) 2022-11-26 20:56:24 +08:00
Keyong Zhou
9214b82181
[CELEBORN-68] Client might fetch incorrect data chunk (#1010) 2022-11-26 18:06:06 +08:00
Ethan Feng
93dbf3f8b1
[CELEBORN-67] Revert "Fix fetch incorrect data chunk" related commits (#1006)
* Revert "[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)"

This reverts commit 1e8f6dc5e8.

* Revert "[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000)"

This reverts commit f1c4d675d6.

* Revert "[CELEBORN-49] Deadlock when kill worker in shuffle read (#998)"

This reverts commit 0be4b3399c.

* Revert "[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995)"

This reverts commit 2b05228871.

* Revert "[BUG] Fix fetch incorrect data chunk (#926)"

This reverts commit 6f043f8a

* Revert "[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954)"

This reverts commit 64e8ebf1
2022-11-25 20:57:47 +08:00
nafiy
fe13e9e261
[CELEBORN-59][REFACTOR] Support send destroy slots request in parallel (#1004) 2022-11-25 18:26:05 +08:00
Angerszhuuuu
1e8f6dc5e8
[CELEBORN-50][FOLLOWUP] Channel inactive may cause new client use old stream id to fetch data (#999)
* [CELEBORN-48][BUG] Channel inactive may cause new client use old stream id to fetch data
2022-11-23 18:22:06 +08:00
Ethan Feng
f1c4d675d6
[CELEBORN-50] Channel inActive may cause new client use old stream id to fetch data cause IllegalStateException. (#1000) 2022-11-23 18:07:57 +08:00
Keyong Zhou
0be4b3399c
[CELEBORN-49] Deadlock when kill worker in shuffle read (#998) 2022-11-23 17:31:05 +08:00
Angerszhuuuu
2b05228871
[CELEBORN-47][IMPROVEMENT] Refine logs about tracking fetch chunk (#995) 2022-11-23 11:56:10 +08:00
Keyong Zhou
cfc1fa15bd
[CELEBORN-46] Refine log for RssInputStream.close() (#994) 2022-11-22 22:01:08 +08:00
Shuang
1656458788
[CELEBORN-14] [ISSUE-955] support register attempt map task (#984) 2022-11-22 15:23:20 +08:00
Angerszhuuuu
5ec278f99a
[ISSUE-987][FEATURE] During worker shutdown, return HARD_SPLIT for all existed partition (#988) 2022-11-22 14:29:55 +08:00
Shuang
fb6d1de108
[CELEBORN-8] [ISSUE-952][FEATURE] support register shuffle task in map partition mode (#973) 2022-11-16 21:46:19 +08:00
Angerszhuuuu
64e8ebf158
[ISSUE-925][FOLLOWUP] Refactor class name of RetryingChunkReceiveCallback (#954) 2022-11-11 14:00:47 +08:00
leesf
0b8376e2c7
Cleanup some code (#943) 2022-11-11 13:58:39 +08:00