Commit Graph

160 Commits

Author SHA1 Message Date
Angerszhuuuu
6623309237
[CELEBORN-1143][BUG] SortBasedPusher pushData should inc memory spill metrics
### What changes were proposed in this pull request?
SortBasedPusher `pushData` should inc memory spill metrics

### Why are the changes needed?
Make metrics more acurate

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #2117 from AngersZhuuuu/CELEBORN-1143.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-11-28 16:26:00 +08:00
onebox-li
927d62425b
[CELEBORN-1125][FOLLOWUP] Add failureaccess shade
### What changes were proposed in this pull request?
Add failureaccess shade.

### Why are the changes needed?
When test main branch, client got error like below:
```
Caused by: java.lang.NoClassDefFoundError: org/apache/celeborn/shaded/com/google/common/util/concurrent/internal/InternalFutureFailureAccess
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3517)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3521)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2170)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2081)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:4019)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4933)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.replyGetReducerFileGroup(ReducePartitionCommitHandler.scala:283)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.handleGetReducerFileGroup(ReducePartitionCommitHandler.scala:300)
	at org.apache.celeborn.client.CommitManager.handleGetReducerFileGroup(CommitManager.scala:266)
	at org.apache.celeborn.client.LifecycleManager.org$apache$celeborn$client$LifecycleManager$$handleGetReducerFileGroup(LifecycleManager.scala:628)
	at org.apache.celeborn.client.LifecycleManager$$anonfun$receiveAndReply$1.applyOrElse(LifecycleManager.scala:314)
	at org.apache.celeborn.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
	at org.apache.celeborn.common.rpc.netty.Inbox.safelyCall(Inbox.scala:222)
	at org.apache.celeborn.common.rpc.netty.Inbox.process(Inbox.scala:110)
	at org.apache.celeborn.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #2116 from onebox-li/shade-add-failureaccess.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-27 18:01:38 +08:00
Erik.fang
aee41555c6 [CELEBORN-955] Re-run Spark Stage for Celeborn Shuffle Fetch Failure
### What changes were proposed in this pull request?
Currently, Celeborn uses replication to handle shuffle data lost for celeborn shuffle reader, this PR implements an alternative solution by Spark stage resubmission.

Design doc:
https://docs.google.com/document/d/1dkG6fww3g99VAb1wkphNlUES_MpngVPNg8601chmVp8/edit

### Why are the changes needed?
Spark stage resubmission uses less resources compared with replication, and some Celeborn users are also asking for it

### Does this PR introduce _any_ user-facing change?
a new config celeborn.client.fetch.throwsFetchFailure is introduced to enable this feature

### How was this patch tested?
two UTs are attached, and we also tested it in Ant Group's Dev spark cluster

Closes #1924 from ErikFang/Re-run-Spark-Stage-for-Celeborn-Shuffle-Fetch-Failure.

Lead-authored-by: Erik.fang <fmerik@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-26 16:47:58 +08:00
gaochao0509
ca47cce78e [CELEBORN-1123] Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency
### What changes were proposed in this pull request?

Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency.

### Why are the changes needed?

When columnar shuffle is enabled, it was found that the shuffle class operator of Spark RDD is not supported. It's recommended to support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- `CelebornColumnarShuffleReaderSuite#columnarShuffleReaderNewSerializerInstance`
- `ColumnarHashBasedShuffleWriterSuiteJ#createColumnarShuffleWriter`

Closes #2101 from gaochao0509/CELEBORN-1123.

Authored-by: gaochao0509 <1623735386@qq.com>
Signed-off-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
2023-11-20 19:48:56 +08:00
TongWei1105
0583cdb5a8 [CELEBORN-1048] Align fetchWaitTime metrics to spark implementation
### What changes were proposed in this pull request?
Align fetchWaitTime metrics to spark implementation

### Why are the changes needed?
In our production environment, there are variations in the fetchWaitTime metric for the same stage of the same job.

ON YARN ESS:
![image](https://github.com/apache/incubator-celeborn/assets/68682646/601a8315-1317-48dc-b9a6-7ea651d5122d)
ON CELEBORN
![image](https://github.com/apache/incubator-celeborn/assets/68682646/e00ed60f-3789-4330-a7ed-fdd5754acf1d)
Then, based on the implementation of Spark ShuffleBlockFetcherIterator, I made adjustments to the fetchWaitTime metrics code

Now, looks like more reasonable, 
![image](https://github.com/apache/incubator-celeborn/assets/68682646/ce5e46e4-8ed2-422e-b54b-cd594aad73dd)
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
yes, tested in our production environment

Closes #2000 from TongWei1105/CELEBORN-1048.

Lead-authored-by: TongWei1105 <vvtwow@gmail.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-11-02 15:27:30 +08:00
sychen
70366edb7c [CELEBORN-1096] Avoid initializing SortShuffleManager when stop
### What changes were proposed in this pull request?

### Why are the changes needed?
If there is no fallback to the Spark SortShuffleManager, it will be initialized once when stopping, and then stopped again. This is not necessary.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2052 from cxzl25/CELEBORN-1096.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-30 14:21:06 +08:00
SteNicholas
df40a28959 [CELEBORN-1032][FOLLOWUP] Use scheduleWithFixedDelay instead of scheduleAtFixedRate in threads pool of master and worker
### What changes were proposed in this pull request?

Use `scheduleWithFixedDelay` instead of `scheduleAtFixedRate` in thread pool of Celeborn Master and Worker.

### Why are the changes needed?

Follow up #1970.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2048 from SteNicholas/CELEBORN-1032.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-10-27 11:20:28 +08:00
SteNicholas
49ea881037
[MINOR] Remove unnecessary increment index of Master#timeoutDeadWorkers
### What changes were proposed in this pull request?

Remove unnecessary increment index of `Master#timeoutDeadWorkers`.

### Why are the changes needed?

Increment index of `Master#timeoutDeadWorkers` is unnecessary.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2027 from SteNicholas/timeout-dead-workers.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-10-23 22:18:39 +08:00
Mridul Muralidharan
eb382b018c
[CELEBORN-1072] Fix misc error prone reports found
Fix misc error prone reports.
As detailed in the jira, they are:
* Reference equality of boxed primitive types: see [BoxedPrimitiveEquality](https://errorprone.info/bugpattern/BoxedPrimitiveEquality)
* Calling run directly - since use is legitimate, mark it as ignore. See: [DoNotCall](https://errorprone.info/bugpattern/DoNotCall)
* `Ignore` test instead of catching `AssertionError` and ignoring it. See: [AssertionFailureIgnored](https://errorprone.info/bugpattern/AssertionFailureIgnored)

Fix misc error prone reports.

No

Unit tests

Closes #2019 from mridulm/fix-misc-issues.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-10-23 11:15:10 +08:00
Jiaan Geng
cdcd30dc2f [CELEBORN-1041] Improve the implementation for get the PartitionIdPassthrough class
### What changes were proposed in this pull request?
Currently, the code of get the contractor of `PartitionIdPassthrough` is very redundant.
We should improve the implementation.

### Why are the changes needed?
Improve the implementation for get the `PartitionIdPassthrough` class

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New test cases.

Closes #1989 from beliefer/CELEBORN-1041.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-10-16 14:24:49 +08:00
sychen
62ba44d8da [CELEBORN-1026] Optimize registerShuffle fallback log
### What changes were proposed in this pull request?

### Why are the changes needed?
According to https://github.com/apache/incubator-celeborn/pull/1955 , when celeborn is not available, `spark.dynamicAllocation.enabled=true` and  `spark.shuffle.service. enabled=true`, shuffle data should not be lost.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1963 from cxzl25/CELEBORN-1026.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-10-11 20:25:44 +08:00
Jiaan Geng
a776e7e315 [CELEBORN-1019] Avoid magic strings copy from Spark SQLConf
### What changes were proposed in this pull request?
Avoid magic strings copy from Spark `SQLConf`.

### Why are the changes needed?
Currently, the spark integration uses many magic strings copy from Spark `SQLConf`.
Since we have already depend on Spark, references the variable from Spark is better.

### Does this PR introduce _any_ user-facing change?
No.
Just update the inner implementation.

### How was this patch tested?
Exists test cases.

Closes #1968 from beliefer/CELEBORN-1019.

Lead-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-10-10 19:52:24 +08:00
Fu Chen
b2412d0774 [CELEBORN-1022][TEST] Update log level from FATAL to ERROR for console output in unit tests
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

1. this is developer-friendly for debugging unit tests in IntelliJ IDEA, for example: Netty's memory leak reports are logged at the error level and won't cause unit tests to be marked as fatal.

```
23/10/09 09:57:26,422 ERROR [fetch-server-52-2] ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at:
	io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:403)
	io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
	io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
	io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:140)
	io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
	io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:150)
	io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	java.lang.Thread.run(Thread.java:750)
```

2. this won't increase console output and affect the stability of CI.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1958 from cfmcgrady/ut-console-log-level.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-10-09 15:56:05 +08:00
Cheng Pan
78553f1418
[CELEBORN-1003] Correct the LICENSE and NOTICE for shaded client jars
### What changes were proposed in this pull request?

Correct the `LICENSE` and `NOTICE` for the following shaded client jars

- `celeborn-client-flink-1.14-shaded_2.12-<version>.jar`
- `celeborn-client-flink-1.15-shaded_2.12-<version>.jar`
- `celeborn-client-flink-1.17-shaded_2.12-<version>.jar`
- `celeborn-client-mr-shaded_2.12-<version>.jar`
- `celeborn-client-spark-2-shaded_2.11-<version>.jar`
- `celeborn-client-spark-3-shaded_2.12-<version>.jar`

### Why are the changes needed?

The `LICENSE` and `NOTICE` shipped in a jar should match the content of the jar, for shaded jars, it should acknowledge all the third-party classes that are bundled.

See more discussion at https://lists.apache.org/thread/8v4wy5o132rpsjync6465zztgjlf6h5p

For how to determine which third-party jars are bundled, take `celeborn-client-spark-3-shaded_2.12-<version>.jar` as an example, the following command performs the packaging, and we can find them out by looking at logs like `Including ... in the shaded jar`

```
build/mvn clean package -DskipTests -pl :celeborn-client-spark-3-shaded_2.12 -am -Pspark-3.3
```

```
[INFO] --- maven-shade-plugin:3.4.0:shade (default)  celeborn-client-spark-3-shaded_2.12 ---
[INFO] Including org.apache.celeborn:celeborn-client-spark-3_2.12🫙0.4.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.celeborn:celeborn-common_2.12🫙0.4.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.12.0 in the shaded jar.
[INFO] Including io.netty:netty-all:jar:4.1.93.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.93.Final in the shaded jar.
...
[INFO] Excluding org.apache.ratis:ratis-common:jar:2.5.1 from the shaded jar.
[INFO] Excluding org.apache.ratis:ratis-thirdparty-misc:jar:1.0.4 from the shaded jar.
[INFO] Excluding org.apache.ratis:ratis-proto:jar:2.5.1 from the shaded jar.
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes #1933 from pan3793/CELEBORN-1003.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-28 19:23:54 +08:00
sychen
38a68163e0 [CELEBORN-957] Simplify nano time duration calculation
### What changes were proposed in this pull request?
use `TimeUnit.NANOSECONDS.toMillis` instead of `/1000_000`

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1888 from cxzl25/CELEBORN-957.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-08 19:03:37 +08:00
zhouyifan279
9e01aac501
[CELEBORN-913] Implement method ShuffleDriverComponents#supportsReliableStorage
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-42689

### Does this PR introduce _any_ user-facing change?
Yes. User need to set `spark.shuffle.sort.io.plugin.class` to `org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO` to enable this feature.

### How was this patch tested?
Add a new matrix dimension, shuffle-plugin-class, in github ci, to run spark tests over `LocalDiskShuffleDataIO` and `CelebornShuffleDataIO` respectively.

Closes #1884 from zhouyifan279/spark-driver-component.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-07 16:25:09 +08:00
jiaoqingbo
b2e03d27bd [CELEBORN-950] Change CelebornShuffleReader log level and information
### What changes were proposed in this pull request?

As Title

### Why are the changes needed?

As Title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

PASS GA

Closes #1882 from jiaoqingbo/950.

Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-06 21:07:19 +08:00
xiyu.zk
d53b6e53c7 [CELEBORN-946][GLUTEN] Record read metric should be compatible with Gluten shuffle dependency
### What changes were proposed in this pull request?
Currently judging whether it is a Gluten shuffle through serde is only applicable to Velox Backend. In order to adapt to ClickHouse Backend at the same time, it is more generic to use ColumnarShuffleDependency as the judgment basis.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1878 from kerwin-zk/gluten.

Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-05 18:34:12 +08:00
zky.zhoukeyong
a42ec85a6e [CELEBORN-943][PERF] Pre-create CelebornInputStreams in CelebornShuffleReader
### What changes were proposed in this pull request?
This PR fixes performance degradation when Spark's coalescePartitions takes effect caused
by RPC latency.

### Why are the changes needed?
I encountered a performance degradation when testing  tpcds 10T q10:
||Time|
|---|---|
|ESS|14s|
|Celeborn| 24s|

After digging into it I found out that q10 triggers partition coalescence:
![image](https://github.com/apache/incubator-celeborn/assets/948245/0b4745da-8d57-4661-a35d-683d97f56e1d)

As I configured `spark.sql.adaptive.coalescePartitions.initialPartitionNum` to 1000, `CelebornShuffleReader`
will call `shuffleClient.readPartition` sequentially 1000 times, causing the delay.

This PR optimizes by calling `shuffleClient.readPartition` in parallel. After this PR q10 time becomes 14s.

### Does this PR introduce _any_ user-facing change?
No, but introduced a new client side configuration `celeborn.client.streamCreatorPool.threads`
which defaults to 32.

### How was this patch tested?
TPCDS 1T and passes GA.

Closes #1876 from waitinfuture/943.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-04 21:46:11 +08:00
zhouyifan279
3bad1c8abc [CELEBORN-919][FOLLOWUP] Put map index args after partition index args in CelebornShuffleReader constructor
### What changes were proposed in this pull request?

Put map index args after partition index args in CelebornShuffleReader constructor

### Why are the changes needed?

#1853 changed the args order in CelebornShuffleReader constructor. It will break gluten celeborn shuffle manager.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Run test locally.

Closes #1869 from zhouyifan279/shuffle-reader-ctor.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-31 17:22:10 +08:00
zhouyifan279
fda735391a [CELEBORN-919][FOLLOWUP] Unify the order of map index args and partit…
…ion index args in ShuffleReader related methods

### What changes were proposed in this pull request?
Unify the order of map index args and partition index args in ShuffleReader related methods.

### Why are the changes needed?
The order of map index args and partition index args in CelebornShuffleReader constructor is different the order in `SparkShuffleManager#getReader`.
It can messed up easily.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Run tests locally.

Closes #1853 from zhouyifan279/columnar-shuffle-followup.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-29 17:25:46 +08:00
SteNicholas
4625484d2c [CELEBORN-830] Check available workers in CelebornShuffleFallbackPolicyRunner
### What changes were proposed in this pull request?

`CelebornShuffleFallbackPolicyRunner` could not only check quota, but also check whether cluster has available workers. If there is no available workers, fallback to external shuffle.

### Why are the changes needed?

`CelebornShuffleFallbackPolicyRunner` adds a check for available workers.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- `SparkShuffleManagerSuite#testClusterNotAvailableWithAvailableWorkers`

Closes #1814 from SteNicholas/CELEBORN-830.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-29 16:56:56 +08:00
zhouyifan279
2c07c55e77 [CELEBORN-919] Move Columnar Shuffle code into an individual module
### What changes were proposed in this pull request?

Move Columnar Shuffle code into an individual module

### Why are the changes needed?

Spark 3.5 made a lot of changes to AtomicType in https://issues.apache.org/jira/browse/SPARK-42887.

This causes compilation error when building columnar shuffle code.

As columnar shuffle is a configurable feature, I think it's better to move related code into a individual module. Then we can exclude this module when build with Spark 3.5 for now.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Add test `ColumnarHashBasedShuffleWriterSuiteJ` and `CelebornColumnarShuffleReaderSuite`

Closes #1843 from zhouyifan279/columnar-shuffle.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-28 12:19:28 +00:00
Xiduo You
ae39a97548 [CELEBORN-922] Improve celeborn shuffle maanger fallback log message
### What changes were proposed in this pull request?

When celeborn shuffle maanger fallback to vanilla Spark shuffle manager, we should make sure the DRA is disabled, otherwise the reduer task may fail when fetching block.

This pr improves the log to use error level to print fallback message if DRA is enabled.

### Why are the changes needed?

Improve the log message.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

PASS CI

Closes #1842 from ulysses-you/log.

Authored-by: Xiduo You <ulyssesyou18@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-26 13:26:26 +08:00
youxiduo
b3700dd169
[CELEBORN-917][GLUTEN] Record read metric should be compatible with Gluten shuffle serde
### What changes were proposed in this pull request?

When updating record read metric, we should consider if the input record is `ColumnarBatch`. So if the serde is the Gluten columnar batch, we should use `ColumnarBatch.numRows`.

### Why are the changes needed?

Make the shuffle record read metric correct.

### Does this PR introduce _any_ user-facing change?
yes, the metrics changed

### How was this patch tested?
manually test

before:
<img width="415" alt="image" src="https://github.com/apache/incubator-celeborn/assets/12025282/221ab814-4b02-4688-80ab-31f21cd900a4">

after:
<img width="415" alt="image" src="https://github.com/apache/incubator-celeborn/assets/12025282/1c7257c0-2f30-41c3-9ea8-6bc5cda3de85">

Closes #1838 from ulysses-you/gluten.

Lead-authored-by: youxiduo <youxiduo@corp.netease.com>
Co-authored-by: Xiduo You <ulyssesyou18@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-25 15:01:09 +08:00
mingji
54622814cc
[CELEBORN-885][SPARK] Shade RoaringBitmap to avoid dependency conflicts
### What changes were proposed in this pull request?
Shade roaring bitmap to void dependency conflicts.

### Why are the changes needed?
Some user reports that celeborn client will introduce roaring bitmap conflicts.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #1803 from FMX/CELEBORN-885.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-08-11 12:48:12 +08:00
Angerszhuuuu
0db2150731 [CELEBORN-808] Remove unnecessary RssShuffleManager in 0.4.0
### What changes were proposed in this pull request?
As title

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1731 from AngersZhuuuu/CELEBORN-808.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2023-07-27 17:47:44 +08:00
Angerszhuuuu
5c7848d531 [CELEBORN-804][FOLLOWUP] ShuffleManager stop should set shuffleClient to null
### What changes were proposed in this pull request?
As title

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1737 from AngersZhuuuu/CELEBORN-804-FOLLOWUP.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-20 20:35:23 +08:00
zky.zhoukeyong
6a5e3ed794 [CELEBORN-812] Cleanup SendBufferPool if idle for long
### What changes were proposed in this pull request?
Cleans up the pooled send buffers and push tasks if the SendBufferPool has been idle for more than
`celeborn.client.push.sendbufferpool.expireTimeout`.

### Why are the changes needed?
Before this PR the SendBufferPool will cache the send buffers and push tasks forever. If they are large
and will not be reused in the future, it wastes memory and causes GC.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Passes GA and manual tests.

Closes #1735 from waitinfuture/812-1.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-20 00:34:55 +08:00
Angerszhuuuu
5471a6afe5
[CELEBORN-804] ShuffleClient should cleanup shuffle infos when trigger unregisterShuffle
### What changes were proposed in this pull request?

After discussion, we make sure that `shuffleManager.unregisterShuffle()` will be triggered by Spark both in driver and executor. In this pr:

  1. Add shuffle client both in driver and executor side in ShuffleManager
  2. ShuffleClient call cleanupShuffle() when trigger `unregisterShuffle`.

This replaced https://github.com/apache/incubator-celeborn/pull/1719

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1726 from AngersZhuuuu/CELEBORN-804.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-19 20:50:18 +08:00
onebox-li
405b2801fa [CELEBORN-810] Fix some typos and grammar
### What changes were proposed in this pull request?
Fix some typos and grammar

### Why are the changes needed?
Ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manually test

Closes #1733 from onebox-li/fix-typo.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-19 18:35:38 +08:00
Angerszhuuuu
c8ad39d9bd [CELEBORN-809] Directly use isDriver passed from SparkEnv
### What changes were proposed in this pull request?
As title
<img width="1051" alt="截屏2023-07-19 下午1 01 25" src="https://github.com/apache/incubator-celeborn/assets/46485123/26d506b2-bab9-43f5-9bbe-58d22a761bab">

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1732 from AngersZhuuuu/CELEBORN-809.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2023-07-19 15:20:01 +08:00
Cheng Pan
0db919403e Revert "[CELEBORN-798] Add heartbeat from client to LifecycleManager to clean…"
This reverts commit e56a8a8bed.
2023-07-19 15:08:45 +08:00
zky.zhoukeyong
e56a8a8bed [CELEBORN-798] Add heartbeat from client to LifecycleManager to clean…
…up client

### What changes were proposed in this pull request?
Add heartbeat from client to lifecycle manager. In this PR heartbeat request contains local shuffle ids from
client, lifecycle manager checks with it's local set and returns ids it doesn't know. Upon receiving response,
client calls ```unregisterShuffle``` for cleanup.

### Why are the changes needed?
Before this PR, client side ```unregisterShuffle``` is never called. When running TPCDS 3T with spark thriftserver
without DRA, I found the Executor's heap contains 1.6 million PartitionLocation objects (and StorageInfo):
![image](https://github.com/apache/incubator-celeborn/assets/948245/43658369-7763-4511-a5b0-9b3fbdf02005)

After this PR, the number of PartitionLocation objects decreases to 275 thousands
![image](https://github.com/apache/incubator-celeborn/assets/948245/45f8f849-186d-4cad-83c8-64bd6d18debc)

This heartbeat can be extended in the future for other purposes, i.e. reporting client's metrics.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Passes GA and  manual test.

Closes #1719 from waitinfuture/798.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-17 18:14:10 +08:00
Cheng Pan
1ec4f4a9f5 [CELEBORN-801] Warn when local shuffle reader is enabled
### What changes were proposed in this pull request?

Warn when local shuffle reader is enabled.

```
Detected spark.sql.adaptive.localShuffleReader.enabled (default is true) is enabled,
it's highly recommended to disable it when use Celeborn as Remote Shuffle Service to
avoid performance degradation.
```

### Why are the changes needed?

When local shuffle reader is enabled, the reduce task may read shuffle data in by map id, which is not match the Celeborn shuffle data clustering model, then cause extremely bad shuffle read performance.

### Does this PR introduce _any_ user-facing change?

Yes, user would see warning message from Driver log when `spark.sql.adaptive.localShuffleReader.enabled` is true.

### How was this patch tested?

Review.

Closes #1721 from pan3793/CELEBORN-801.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-17 16:43:50 +08:00
zky.zhoukeyong
10a1def512 [CELEBORN-802] Reuse DataPusher#idleQueue by pooling to avoid too many byte[] objects
### What changes were proposed in this pull request?
Reuse ```DataPusher#idleQueue``` by pooling in ```SendBufferPool``` to avoid too many ```byte[]```
objects in ```PushTask```.

### Why are the changes needed?
I'm testing 3T TPCDS. Before this PR, I encountered Container killed because of OOM, GC is about 9.6h. For alive Executors, I dumped the memory and see number of PushTask object is 2w, and the number of ```64k``` byte[] is 23356, total around 1.7G:
![image](https://github.com/apache/incubator-celeborn/assets/948245/7b4ee4fa-7860-4ddb-b862-181a91748092)

After this PR, no container is killed because of OOM, GC is about 8.6h. I also dumped Executor and found number
of  PushTask object is 3584, and the number of ```64K``` byte[] objects is 5783, total around 361M:
![image](https://github.com/apache/incubator-celeborn/assets/948245/981e8f70-52f8-4bb1-9f67-9a8b4f398392)

Also, before this PR, total execution time is ```3313.8s```, after this PR, total execution time is ```3229.5s```.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Passes GA and Manual test.

Closes #1722 from waitinfuture/802.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-17 16:35:14 +08:00
zky.zhoukeyong
a7bbbd05c4 [CELEBORN-797] Decrease writeTime metric sampling frequency to improve perf
### What changes were proposed in this pull request?
1. Decrease writeTime metric sampling frequency to improve perf
2. Set default value of ```celeborn.<module>.push.timeoutCheck.threads``` and ```celeborn.<module>.fetch.timeoutCheck.threads``` to 4

### Why are the changes needed?
Following are test cases
case 1: ```spark.sparkContext.parallelize(1 to 8000, 8000).flatMap( _ => (1 to 15000000).iterator.map(num => num)).repartition(8000).count``` // shuffle 1.1T data
case 2: ```spark.sparkContext.parallelize(1 to 8000, 8000).flatMap( _ => (1 to 30000000).iterator.map(num => num)).repartition(8000).count``` // shuffle 2.2T data
Following are e2e time of shuffle write stage
||Sort pusher before|Sort pusher after|Hash pusher before|Hash pusher after|
|----|----|----|----|-----|
|case1|4.4min|4.1min|4.4min|3.9min|
|case2|9.1min|8.4min|9.7min|8.5min|

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Passes GA and manual test.

Closes #1718 from waitinfuture/797.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-14 20:51:50 +08:00
无迹
e1337972e8 [CELEBORN-792] SparkShuffleManager.getWriter use wrong appUniqueId fo…
…r Spark2

### What changes were proposed in this pull request?
As title

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA and manual test.

Closes #1717 from shujiewu/CELEBORN-792.

Authored-by: 无迹 <peter.wsj@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-14 17:17:48 +08:00
Fu Chen
90ba9f3e87 [CELEBORN-783][FOLLOWUP] Private member updates and cleanup in SortBasedPusher
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

https://github.com/apache/incubator-celeborn/pull/1699#discussion_r1259137323

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1704 from cfmcgrady/insert-record-followup.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-11 23:08:42 +08:00
Fu Chen
e47ec10cef [CELEBORN-783] Revise the conditions for the SortBasedPusher#insertRecord method
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

[comment](7adf1fca41 (r121138008))

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

Closes #1699 from cfmcgrady/insert-record.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-11 11:36:29 +08:00
Fu Chen
2bd1d86d41
[CELEBORN-775] Fix executorCores calculation in SparkShuffleManager for Spark local mode
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

```shell
$ bin/spark-shell --master local[2]
23/07/06 16:11:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/07/06 16:11:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as 'sc' (master = local[2], app id = local-1688631101733).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sparkContext.getConf.get("spark.executor.cores")
java.util.NoSuchElementException: spark.executor.cores
  at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.SparkConf.get(SparkConf.scala:245)
  ... 47 elided

scala>
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CelebornPipelineSortSuite should cover this change

Closes #1685 from cfmcgrady/local-core-number.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-06 16:29:59 +08:00
Angerszhuuuu
693172d0bd [CELEBORN-751] Rename remain rss related class name and filenames etc
### What changes were proposed in this pull request?
Rename remain rss related class name and filenames etc...

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1664 from AngersZhuuuu/CELEBORN-751.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2023-07-04 10:20:08 +08:00
Angerszhuuuu
5c7ecb8302
[CELEBORN-754][IMPORTANT] Provide a new SparkShuffleManager to replace RssShuffleManager in the future
### What changes were proposed in this pull request?
Provide a new SparkShuffleManager to replace RssShuffleManager in the future

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1667 from AngersZhuuuu/CELEBORN-754.

Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-30 17:27:33 +08:00
Angerszhuuuu
4c67325a3d
[CELEBORN-720][SPARK] Correct metric peakExecutionMemory of SortBasedShuffleWriter
### What changes were proposed in this pull request?
Currently SortBasedShuffleWriter won't update peakMemoryUsedBytes, this pr support this.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1632 from AngersZhuuuu/CELEBORN-720.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-27 18:40:06 +08:00
Fu Chen
4b8f126d54 [CELEBORN-716][BUILD] Correct the to name when renaming the Netty native library
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

before this PR the `liborg_apache_celeborn_shaded_netty_transport_native_epoll_aarch_64.so` can't correctly be loaded.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually tested

```shell
> tar zxf celeborn-client-spark-3-shaded_2.12-0.4.0-SNAPSHOT.jar
> find * -name "*.so"
META-INF/native/liborg_apache_celeborn_shaded_netty_transport_native_epoll_aarch_64.so
META-INF/native/liborg_apache_celeborn_shaded_netty_transport_native_epoll_x86_64.so
```

Closes #1625 from cfmcgrady/typo.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-26 21:57:06 +08:00
Fu Chen
1b3ec61690 [CELEBORN-711][TEST] Rework PushDataTimeoutTest
### What changes were proposed in this pull request?

1. separated push data timeout tests and push merge data timeout tests in `PushDataTimeoutTest`
2. updated the test results assertion
3. rework `pushdata timeout will add to blacklist`

### Why are the changes needed?

ensure that the timeout behavior is correctly implemented

https://github.com/apache/incubator-celeborn/pull/1613#discussion_r1236423721

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #1620 from cfmcgrady/push-timeout-test.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-26 13:45:27 +08:00
zky.zhoukeyong
6b82ecdfa0 [CELEBORN-712] Make appUniqueId a member of ShuffleClientImpl and refactor code
### What changes were proposed in this pull request?
Make appUniqueId a member of ShuffleClientImpl and remove applicationId from RPC messages across client side, so it won't cause compatibility issues.

### Why are the changes needed?
Currently Celeborn Client is bound to a single application id, so there's no need to pass applicationId around in many RPC messages in client side.

### Does this PR introduce _any_ user-facing change?
In some logs the application id will not be printed, which should not be a problem.

### How was this patch tested?
UTs.

Closes #1621 from waitinfuture/appid.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-25 21:37:16 +08:00
Fu Chen
18f2be0fbe
[CELEBORN-693][SPARK] Align the incWriterTime in the hash-based shuffle writer with the sort-based shuffle
### What changes were proposed in this pull request?

As title.

### Why are the changes needed?

https://github.com/apache/incubator-celeborn/pull/1585#issuecomment-1589164128

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

tested locally.

Closes #1604 from cfmcgrady/hash-based-writer-metrics.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-19 15:42:01 +08:00
sychen
e734ceb558 [MINOR] Cleanup code
### What changes were proposed in this pull request?
1. Use `<arg>-Ywarn-unused-import</arg>` to remove some unused imports
There is no way to use `<arg>-Ywarn-unused-import</arg>` at this stage
Because we have the following code
```
// Can Remove this if celeborn don't support scala211 in future
import org.apache.celeborn.common.util.FunctionConverter._
```
2. Fix scala case match not fully covered, avoid `scala.MatchError`
3. Fixed some scala compilation warnings

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1600 from cxzl25/cleanup_code.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-19 11:31:51 +08:00
Fu Chen
b9c9c00697 [CELEBORN-683][SPARK][PERF] Avoid calling CelebornConf.get multi-time when columnar shuffle wri…
…te is enabled.

### What changes were proposed in this pull request?

as title.

### Why are the changes needed?

frame graph and stage duration before:

![截屏2023-06-15 下午4 49 04](https://github.com/apache/incubator-celeborn/assets/8537877/6fe7f7f6-fd36-42ec-a6a1-9a4943022dc8)

![截屏2023-06-15 下午4 57 53](https://github.com/apache/incubator-celeborn/assets/8537877/077f6c22-4dc9-497a-affe-ddba9200fe28)

frame graph and stage duration after:

![截屏2023-06-15 下午4 37 45](https://github.com/apache/incubator-celeborn/assets/8537877/d6ae7aa6-95c7-490e-a0ae-c110e6a83e5a)

![截屏2023-06-15 下午4 58 12](https://github.com/apache/incubator-celeborn/assets/8537877/e8dd5c3b-94d9-47d7-a644-4897acef43ad)

### Does this PR introduce _any_ user-facing change?

No, only perf improvement.

### How was this patch tested?

tested locally.

Closes #1595 from cfmcgrady/columnar-conf.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-15 17:52:23 +08:00