Commit Graph

59 Commits

Author SHA1 Message Date
Angerszhuuuu
5471a6afe5
[CELEBORN-804] ShuffleClient should cleanup shuffle infos when trigger unregisterShuffle
### What changes were proposed in this pull request?

After discussion, we make sure that `shuffleManager.unregisterShuffle()` will be triggered by Spark both in driver and executor. In this pr:

  1. Add shuffle client both in driver and executor side in ShuffleManager
  2. ShuffleClient call cleanupShuffle() when trigger `unregisterShuffle`.

This replaced https://github.com/apache/incubator-celeborn/pull/1719

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1726 from AngersZhuuuu/CELEBORN-804.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-19 20:50:18 +08:00
onebox-li
405b2801fa [CELEBORN-810] Fix some typos and grammar
### What changes were proposed in this pull request?
Fix some typos and grammar

### Why are the changes needed?
Ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manually test

Closes #1733 from onebox-li/fix-typo.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-19 18:35:38 +08:00
Cheng Pan
0db919403e Revert "[CELEBORN-798] Add heartbeat from client to LifecycleManager to clean…"
This reverts commit e56a8a8bed.
2023-07-19 15:08:45 +08:00
zky.zhoukeyong
e56a8a8bed [CELEBORN-798] Add heartbeat from client to LifecycleManager to clean…
…up client

### What changes were proposed in this pull request?
Add heartbeat from client to lifecycle manager. In this PR heartbeat request contains local shuffle ids from
client, lifecycle manager checks with it's local set and returns ids it doesn't know. Upon receiving response,
client calls ```unregisterShuffle``` for cleanup.

### Why are the changes needed?
Before this PR, client side ```unregisterShuffle``` is never called. When running TPCDS 3T with spark thriftserver
without DRA, I found the Executor's heap contains 1.6 million PartitionLocation objects (and StorageInfo):
![image](https://github.com/apache/incubator-celeborn/assets/948245/43658369-7763-4511-a5b0-9b3fbdf02005)

After this PR, the number of PartitionLocation objects decreases to 275 thousands
![image](https://github.com/apache/incubator-celeborn/assets/948245/45f8f849-186d-4cad-83c8-64bd6d18debc)

This heartbeat can be extended in the future for other purposes, i.e. reporting client's metrics.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Passes GA and  manual test.

Closes #1719 from waitinfuture/798.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-17 18:14:10 +08:00
Angerszhuuuu
693172d0bd [CELEBORN-751] Rename remain rss related class name and filenames etc
### What changes were proposed in this pull request?
Rename remain rss related class name and filenames etc...

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1664 from AngersZhuuuu/CELEBORN-751.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2023-07-04 10:20:08 +08:00
Fu Chen
adbd38a926
[CELEBORN-726][FOLLOWUP] Update data replication terminology from master/slave to primary/replica in the codebase
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

In order to distinguish it from the existing master/worker, refactor data replication terminology to 'primary/replica' for improved clarity and inclusivity in the codebase

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #1639 from cfmcgrady/primary-replica.

Lead-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-29 17:07:26 +08:00
zky.zhoukeyong
57b0e815cf [CELEBORN-656] Batch revive RPCs in client to avoid too many requests
### What changes were proposed in this pull request?
This PR batches revive requests and periodically send to LifecycleManager to reduce number or RPC requests.

To be more detailed. This PR changes Revive message to support multiple unique partitions, and also passes a set unique mapIds for checking MapEnd. Each time ShuffleClientImpl wants to revive, it adds a ReviveRquest to ReviveManager and wait for result. ReviveManager batches revive requests and periodically send to LifecycleManager (deduplicated by partitionId). LifecycleManager constructs ChangeLocationsCallContext and after all locations are notified, it replies to ShuffleClientImpl.

### Why are the changes needed?
In my test 3T TPCDS q23a with 3 Celeborn workers, when kill a worker, the LifecycleManger will receive 4.8w Revive requests:
```
[emr-usermaster-1-1 logs]$ cat spark-emr-user-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master-1-1.c-fa08904e94c028d1.out.1 |grep -i revive |wc -l
64364
```
After this PR, number of ReviveBatch requests reduces to 708:
```
[emr-usermaster-1-1 logs]$ cat spark-emr-user-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master-1-1.c-fa08904e94c028d1.out |grep -i revive |wc -l
2573
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test. I have tested:

1. Disable graceful shutdown, kill one worker, job succeeds
2. Disable graceful shutdown, kill two workers successively, job fails as expected
3. Enable graceful shutdown, restart two workers successively, job succeeds
4. Enable graceful shutdown, restart two workers successively, then kill the third one, job succeeds

Closes #1588 from waitinfuture/656-2.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-06-27 22:11:04 +08:00
Shuang
fe2f76dba6 [CELEBORN-717][FLINK][FOLLOWUP] Fix ResultPartition lost numBytesOut/numBuffersOut metrics
### What changes were proposed in this pull request?
Metics update logic need align with Flink 1.17/1.15

### Why are the changes needed?
See [1626](https://github.com/apache/incubator-celeborn/pull/1626) And metics update logic need align with Flink 1.17/1.15

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tpcds Manual

Closes #1631 from RexXiong/CELEBORN-717-FOLLOWUP.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
2023-06-27 21:47:41 +08:00
Shuang
22b21295e8
[CELEBORN-717][FLINK] Fix ResultPartition lost numBytesOut/numBuffersOut metrics
### What changes were proposed in this pull request?
Reset  numBytesOut/numBuffersOut metrics for RemoteShuffleResultPartition

### Why are the changes needed?
Currently ResultPartition lost numBytesOut/numBuffersOut metrics, this will cause Flink AdaptiveScheduler can not dynamically adjust the task parallelism based on the input amount of data

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #1626 from RexXiong/CELEBORN-717.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-06-27 11:49:00 +08:00
zky.zhoukeyong
6b82ecdfa0 [CELEBORN-712] Make appUniqueId a member of ShuffleClientImpl and refactor code
### What changes were proposed in this pull request?
Make appUniqueId a member of ShuffleClientImpl and remove applicationId from RPC messages across client side, so it won't cause compatibility issues.

### Why are the changes needed?
Currently Celeborn Client is bound to a single application id, so there's no need to pass applicationId around in many RPC messages in client side.

### Does this PR introduce _any_ user-facing change?
In some logs the application id will not be printed, which should not be a problem.

### How was this patch tested?
UTs.

Closes #1621 from waitinfuture/appid.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-25 21:37:16 +08:00
onebox-li
47f66a87a1 [CELEBORN-678] ShuffleClientImpl::mapperEnded should not consider attemptId
### What changes were proposed in this pull request?
ShuffleClientImpl::mapperEnded should not consider attemptId, speculation tasks will update attemptId.

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Cluster test

Closes #1591 from onebox-li/fix-mapend.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-14 21:01:06 +08:00
Shuang
e284f72c95 [CELEBORN-660][FLINK] Gen unique app id for Celeborn
### What changes were proposed in this pull request?
Use System.currentTimeMillis() + JobID.generate() as CelebornAppId.

### Why are the changes needed?
Flink Application mode with HA may use fixed id(00000000000000000000000000000000) as jobId. see [FLINK-19358](https://issues.apache.org/jira/browse/FLINK-19358).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual

Closes #1572 from RexXiong/CELEBORN-660.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
2023-06-13 11:15:16 +08:00
Cheng Pan
76533d7324
[CELEBORN-650][TEST] Upgrade scalatest and unify mockito version
### What changes were proposed in this pull request?

This PR upgrades

- `mockito` from 1.10.19 and 3.6.0 to 4.11.0
- `scalatest` from 3.2.3 to 3.2.16
- `mockito-scalatest` from 1.16.37 to 1.17.14

### Why are the changes needed?

Housekeeping, making test dependencies up-to-date and unified.

### Does this PR introduce _any_ user-facing change?

No, it only affects test.

### How was this patch tested?

Pass GA.

Closes #1562 from pan3793/CELEBORN-650.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-09 10:04:14 +08:00
Ethan Feng
76a42beab0
[CELEBORN-610][FLINK] Eliminate pluginconf and merge its content to CelebornConf
### What changes were proposed in this pull request?
Pluginconf might be hard to understand why Celeborn needs to config class.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
UT.

Closes #1524 from FMX/CELEBORN-610.

Authored-by: Ethan Feng <ethanfeng@apache.org>
Signed-off-by: Ethan Feng <ethanfeng@apache.org>
2023-06-05 14:08:53 +08:00
Angerszhuuuu
cf308aa057
[CLEBORN-595] Refine code frame of CelebornConf (#1525) 2023-06-01 10:37:58 +08:00
Ethan Feng
4ee7d9eba8
[CELEBORN-597][FLINK] Support flink floating buffer for input gate and output gate. (#1503) 2023-05-24 23:15:57 +08:00
Shuang
f83304c337
[CELEBORN-581][Flink] Support JobManager failover. (#1485) 2023-05-16 14:51:53 +08:00
zhongqiangchen
5769c3fdc7
[CELEBORN-552] Add HeartBeat between the client and worker to keep alive (#1457) 2023-05-10 19:35:51 +08:00
Ethan Feng
114b1b4d62
[CELEBORN-548][FLINK] Support flink 1.17. (#1472) 2023-05-05 23:00:49 +08:00
Ethan Feng
93d2f106e0
[CELEBORN-548][FLINK] Support flink 1.15. (#1463) 2023-05-04 15:23:59 +08:00
Angerszhuuuu
ef4c12e0fe
[CELEBORN-565] FETCH_MAX_RETRIES should double when enable replicates (#1471) 2023-04-28 14:27:35 +08:00
Ethan Feng
7937d96226
[CELEBORN-535][FLINK] Reduce message decoder overhead. (#1438) 2023-04-19 11:00:29 +08:00
Ethan Feng
9cccfc9872
[CELEBORN-431][FLINK] Support dynamic buffer allocation in reading map partition. (#1407) 2023-04-13 10:37:47 +08:00
Shuang
c7e08ed22b
[CELEBORN-514][FLINK] RssBufferStream need guarantee close the stream. (#1417) 2023-04-12 18:35:33 +08:00
Shuang
45013b8bae
[CELEBORN-489][FLINK]fix retry client for open stream (#1397) 2023-03-30 11:44:19 +08:00
Ethan Feng
6cee85748d
[CELEBORN-477][FLINK] Report failed partition to flink framework. (#1391) 2023-03-28 15:54:37 +08:00
Fei Wang
7c444cb0c5
[CELEBORN-474] Speed up ConcurrentHashMap#computeIfAbsent (#1383) 2023-03-26 09:41:59 +08:00
Keyong Zhou
637081f604
[CELEBORN-428][FLINK] Remove unnecessary lock in PartitionSortedBuffer (#1352) 2023-03-16 11:56:23 +08:00
Shuang
4f6e90a7d9
[CELEBORN-418][FLINK][FOLLOW UP]Need drop unused bytes from netty when task was already failed (#1350) 2023-03-14 21:04:11 +08:00
Ethan Feng
6f317c77ee
[CELEBORN-422][FLINK] Remove unused fields in ReadData. (#1347) 2023-03-14 19:49:00 +08:00
Shuang
63ab1a66a9
[CELEBORN-418][FLINK] Need drop unused bytes from netty when task was already failed (#1348) 2023-03-14 19:48:41 +08:00
Shuang
cd5241d399
[CELEBORN-381][FLINK] notify the task with the error message when channel in active. (#1341) 2023-03-14 11:28:03 +08:00
Ethan Feng
971c93d4d9
[CELEBORN-419][FLINK] Fix memory leak when receive RPCs with body. (#1343) 2023-03-14 11:27:36 +08:00
Shuang
b499c0df7f
[CELEBORN-417][FLINK] fix memory leak when handler already removed (#1342) 2023-03-13 23:58:21 +08:00
Ethan Feng
c78023824a
[CELEBORN-397][FLINK] Flink plugin support UnpooledByteBufAllocator. (#1324) 2023-03-13 11:36:13 +08:00
Ethan Feng
2d4a4f25bd
[CELEBORN-389][FLINK] Fix remove transportClient from readClientHandler caused NPE (#1323) 2023-03-10 14:43:25 +08:00
Keyong Zhou
21bdfdb21b
[CELEBORN-390][FLINK] Refine synchronization in FlinkShuffleClientImpl#updateFileGroup (#1320) 2023-03-09 16:49:18 +08:00
Ethan Feng
8e167c6488
[CELEBORN-387][FLINK] Remove unnecessary limitZeroInFlight from sendMessageInternal. (#1319) 2023-03-09 12:31:09 +08:00
Keyong Zhou
fd1ac2f711
[CELEBORN-379][FLINK] Fix checkState in TransportFrameDecoderWithBufferSupplier#decodeBodyCopyOut (#1311) 2023-03-06 21:49:18 +08:00
Ethan Feng
675a7da393
[CELEBORN-368][FLINK] Pass exceptions in buffer stream. (#1304) 2023-03-03 15:43:30 +08:00
zhongqiangchen
9dc1bc2b1c
[CELEBORN-367] [FLINK] Move pushdata functions used by mappartition from ShuffleClientImpl to FlinkShuffleClientImpl (#1295) 2023-03-02 18:50:38 +08:00
zhongqiangchen
cb76c4de4c
[CELEBORN-350][FLINK] Add PluginConf to be compatible with old configurations 2023-02-28 20:36:11 +08:00
Shuang
935806f036
[CELEBORN-341][Flink] cache file group for map partition in Flink plugin (#1277) 2023-02-26 20:31:20 +08:00
Shuang
a963aa4b4f
[CELEBORN-333][FLINK] bypass unexpected backlog message when stream closed. (#1268) 2023-02-23 18:49:14 +08:00
Shuang
9754616d79
[CELEBORN-330] fix deadlock when use the same netty channel to receive data while other thread wait the response (#1265) 2023-02-23 17:57:43 +08:00
Ethan Feng
cb8df62ec5
[CELEBORN-324][FLINK] Flink plugin needs reuse connections. (#1257) 2023-02-21 18:32:00 +08:00
Ethan Feng
5dd5e97225
[CELEBORN-322][Flink] Copy out message if it‘s readData only. (#1255) 2023-02-21 15:51:13 +08:00
Ethan Feng
c649655933
Revert "[CELEBORN-322][Flink] Copy out message if it‘s readData only."
This reverts commit 0aa37ed7d3.
2023-02-21 14:48:08 +08:00
Ethan Feng
0aa37ed7d3
[CELEBORN-322][Flink] Copy out message if it‘s readData only. 2023-02-21 14:45:39 +08:00
Ethan Feng
0df08fbdf3
[CELEBORN-320][FLINK] fix handle wrong message type in FetchHandler. (#1254) 2023-02-21 11:51:01 +08:00