Commit Graph

139 Commits

Author SHA1 Message Date
sychen
5746eb36ae
[CELEBORN-1198] Keep debug info when use SBT build
### What changes were proposed in this pull request?
Add "-g" to javac compile parameters when using SBT build

> -g
Generates all debugging information, including local variables. By default, only line number and source file information is generated.

https://docs.oracle.com/en/java/javase/17/docs/specs/man/javac.html

### Why are the changes needed?
`maven-compiler-plugin` defaults to debug=true, `plexus-compiler-javac` will add the parameter `-g`.

SBT does not have this behavior by default, which leads to some differences between the jars of maven and sbt builds, although the code logic is the same.

https://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#debug

736da68adf/src/main/java/org/apache/maven/plugin/compiler/AbstractCompilerMojo.java (L734)

6ae79d7f2f/plexus-compilers/plexus-compiler-javac/src/main/java/org/codehaus/plexus/compiler/javac/JavacCompiler.java (L279-L285)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
```bash
./build/sbt celeborn-worker/package
```

#### Current
`String paramString`

<img width="1450" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/9582402c-93e1-4dc2-b094-0f23c30390a9">

#### PR
<img width="1278" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/82ac3c3d-b3ad-4c94-a73f-09e88371911d">

Closes #2188 from cxzl25/CELEBORN-1198.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-26 15:24:39 +08:00
Fu Chen
173950bca2 [CELEBORN-1194] Add sbt-pgp plugin for publishing signed artifacts
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2184 from cfmcgrady/sbt-pgp-plugin.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-12-21 19:27:09 +08:00
pengqli
a808c252ba
[CELEBORN-1184] Update the snakeyaml version from 1.33 to 2.2
### What changes were proposed in this pull request?
Update the snakeyaml version from 1.33 to 2.2 reducing direct CVE vulnerabilities.

### Why are the changes needed?
The snakeyaml version has the follow CVE vulnerabilities, see
https://scout.docker.com/vulnerabilities/id/CVE-2022-1471

### Does this PR introduce _any_ user-facing change?
No any user-facing change

### How was this patch tested?
./build/make-distribution.sh to package and run test on the local.

Closes #2170 from dev-lpq/snakeyaml_version.

Authored-by: pengqli <pengqli@cisco.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-12-20 21:23:22 +08:00
Fu Chen
eba1efbb04 [CELEBORN-1191] Migrate the release script from Maven to SBT
### What changes were proposed in this pull request?

1. migrated the release script from Maven to SBT.
2. new clients added for publishing
- `celeborn-client-spark-3-shaded_2.13`
- `celeborn-client-mr-shaded_2.12`

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2178 from cfmcgrady/release-sbt.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-12-20 20:20:33 +08:00
pengqli
1037fbf921 [CELEBORN-1173] Upgrade netty version from 4.1.93.Final to 4.1.101.Final
### What changes were proposed in this pull request?
upgrade netty all version from 4.1.93.Final to 4.1.101.Final reducing direct CVE vulnerabilities

### Why are the changes needed?
The netty version has the follow CVE vulnerabilities, see
https://scout.docker.com/vulnerabilities/id/CVE-2023-4586
https://scout.docker.com/vulnerabilities/id/CVE-2023-44487
https://scout.docker.com/vulnerabilities/id/GHSA-xpw8-rcwv-8f8p

### Does this PR introduce _any_ user-facing change?
No any user-facing change

### How was this patch tested?
./build/make-distribution.sh to package and run test on the local.

Closes #2150 from dev-lpq/update_netty_all_version.

Lead-authored-by: pengqli <pengqli@cisco.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-16 14:03:37 +08:00
pengqli
0860553e18 [CELEBORN-1163] Upgrade protobuf from 3.19.2 to 3.21.7
### What changes were proposed in this pull request?
upgrade protobuf from 3.19.2 to 3.21.7 reducing direct CVE vulnerabilities

### Why are the changes needed?

The protobuf version has the follow CVE vulnerabilities, see
https://scout.docker.com/vulnerabilities/id/CVE-2022-3510
https://scout.docker.com/vulnerabilities/id/CVE-2022-3509
https://scout.docker.com/vulnerabilities/id/CVE-2021-22570
https://scout.docker.com/vulnerabilities/id/CVE-2021-22569

### Does this PR introduce _any_ user-facing change?
No any user-facing change

### How was this patch tested?
`./build/make-distribution.sh` to package and run test on the local.

Closes #2142 from dev-lpq/upgrade_protobuf-java_version.

Authored-by: pengqli <pengqli@cisco.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-16 13:58:36 +08:00
Fu Chen
41df4ebbea [CELEBORN-1156][BUILD] SBT publish support
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

Yes, the user can publish shade clients via SBT

### How was this patch tested?

```shell
docker run -d -p 8081:8081 sonatype/nexus3
```

```shell
export SONATYPE_SNAPSHOTS_URL=http://192.168.3.46:8081/repository/maven-snapshots/
export SONATYPE_RELEASES_URL=http://192.168.3.46:8081/repository/maven-releases/
export ASF_USERNAME=admin
export ASF_PASSWORD=123456
```

- Publish the shade client for Spark 3.5:
```shell
./build/sbt -Pspark-3.4 celeborn-client-spark-3-shaded/publish
```

<img width="1673" alt="截屏2023-12-08 下午10 22 07" src="https://github.com/apache/incubator-celeborn/assets/8537877/1e87e7e2-cf3b-4bc0-8272-0f5b03ee65bf">

- Publish the shade client for Flink 1.18:

```shell
$ ./build/sbt -Pflink-1.18 celeborn-client-flink-1_18-shaded/publish
```
<img width="1676" alt="截屏2023-12-08 下午10 25 28" src="https://github.com/apache/incubator-celeborn/assets/8537877/62d0c3c4-e105-4e8a-8d8d-e78650a2eb09">

- Publish the shade client for MapReduce:
```shell
$ ./build/sbt -Pmr celeborn-client-mr-shaded/publish
```
<img width="1672" alt="截屏2023-12-08 下午10 25 47" src="https://github.com/apache/incubator-celeborn/assets/8537877/563d5ad5-fa6d-46fc-9465-8279ef96385a">

Closes #2129 from cfmcgrady/sbt-publish.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-12-15 11:22:35 +08:00
sychen
1567fec194 [CELEBORN-1169] Bump Spark from 3.4.1 to 3.4.2
### What changes were proposed in this pull request?

### Why are the changes needed?
[Spark 3.4.2 released](https://spark.apache.org/news/spark-3-4-2-released.html)
November 30, 2023

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2157 from cxzl25/CELEBORN-1169.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-14 23:06:01 +08:00
sychen
2504b50dd2 [CELEBORN-1170] Upgrade snappy-java from 1.1.8.2 to 1.1.10.5
### What changes were proposed in this pull request?

### Why are the changes needed?
https://github.com/apache/incubator-celeborn/pull/2143

The snappy-java 1.1.8.2 version has the follow CVE vulnerabilities, see
https://scout.docker.com/vulnerabilities/id/CVE-2023-43642
https://scout.docker.com/vulnerabilities/id/CVE-2023-34455

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2158 from cxzl25/CELEBORN-1170.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-14 22:28:32 +08:00
Fu Chen
e4b22787b8 [CELEBORN-1159][BUILD] Update the scope of the protobuf-java dependency from protobuf to runtime
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

Update the `protobuf-java` dependency scope from `protobuf` to `runtime`. This modification allows users to upgrade the version of `protobuf-java`

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2144 from cfmcgrady/protobuf-scope.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-12-11 18:33:15 +08:00
qinrui
04a1e90207 [CELEBORN-1122] Metrics supports json format
### What changes were proposed in this pull request?
If the user does not use prometheus to collect monitoring metrics, but rather some other ones. Using metrics in JSON format would be more user-friendly.The PR supports JSON format for metrics.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
Metrics supports JSON format

### How was this patch tested?
Cluster test.

Closes #2089 from suizhe007/CELEBORN-1122.

Authored-by: qinrui <qr7972@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-12-06 09:24:28 +08:00
sychen
1c7cd1bd13
[CELEBORN-1113][FOLLOWUP] Bump Hadoop client version from 3.2.4 to 3.3.6
### What changes were proposed in this pull request?

### Why are the changes needed?
https://github.com/apache/incubator-celeborn/pull/2077#issuecomment-1835701576

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2127 from cxzl25/CELEBORN-1113-FOLLOWUP.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-01 17:24:37 +08:00
sychen
89b6cac5ab
[CELEBORN-1113] Bump Hadoop client version from 3.2.4 to 3.3.6
### What changes were proposed in this pull request?

### Why are the changes needed?

[[HADOOP-17098](https://issues.apache.org/jira/browse/HADOOP-17098)] Reduce Guava dependency in Hadoop source code

The higher version of hadoop client removes many guava-related methods, which avoids some conflicts on guava.

`hadoop-client-api` 3.3.6
`hadoop-client-runtime` 3.3.6

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2077 from cxzl25/CELEBORN-1113.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-01 15:41:04 +08:00
SteNicholas
4dfcd9b56b [CELEBORN-1092] Introduce JVM monitoring in Celeborn Worker using JVMQuake
### What changes were proposed in this pull request?

Introduce JVM monitoring in Celeborn Worker using JVMQuake to enable early detection of memory management issues and facilitate fast failure.

### Why are the changes needed?

When facing out-of-control memory management in Celeborn Worker we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered. Therefore JVMQuake is introduced to provide more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure. Refers to the principle of [jvmquake](https://github.com/Netflix-Skunkworks/jvmquake) which is a JVMTI agent that attaches to your JVM and automatically signals and kills it when the program has become unstable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`JVMQuakeSuite`

Closes #2061 from SteNicholas/CELEBORN-1092.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-28 20:45:08 +08:00
onebox-li
927d62425b
[CELEBORN-1125][FOLLOWUP] Add failureaccess shade
### What changes were proposed in this pull request?
Add failureaccess shade.

### Why are the changes needed?
When test main branch, client got error like below:
```
Caused by: java.lang.NoClassDefFoundError: org/apache/celeborn/shaded/com/google/common/util/concurrent/internal/InternalFutureFailureAccess
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3517)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3521)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2170)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2081)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:4019)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4933)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.replyGetReducerFileGroup(ReducePartitionCommitHandler.scala:283)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.handleGetReducerFileGroup(ReducePartitionCommitHandler.scala:300)
	at org.apache.celeborn.client.CommitManager.handleGetReducerFileGroup(CommitManager.scala:266)
	at org.apache.celeborn.client.LifecycleManager.org$apache$celeborn$client$LifecycleManager$$handleGetReducerFileGroup(LifecycleManager.scala:628)
	at org.apache.celeborn.client.LifecycleManager$$anonfun$receiveAndReply$1.applyOrElse(LifecycleManager.scala:314)
	at org.apache.celeborn.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
	at org.apache.celeborn.common.rpc.netty.Inbox.safelyCall(Inbox.scala:222)
	at org.apache.celeborn.common.rpc.netty.Inbox.process(Inbox.scala:110)
	at org.apache.celeborn.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #2116 from onebox-li/shade-add-failureaccess.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-27 18:01:38 +08:00
sychen
3054813a0f
[CELEBORN-856] Add mapreduce integration test
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2073 from cxzl25/CELEBORN-856.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-22 14:36:29 +08:00
Fu Chen
aab073ab16
[CELEBORN-1125] Bump guava from 14.0.1 to 32.1.3-jre
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

- bump guava from 14.0.1 to 32.1.3-jre
- refer to https://github.com/apache/spark/pull/26911, remove usages of Guava that no longer work in Guava 27/32, and replace with workalikes. After this PR, Celeborn no longer relies on a specific version of Guava, and is compatible with Guava 14/27/32. we have the ability to specify Guava to 27 when running MapReduce integration tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2090 from cfmcgrady/guava-27.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-21 16:18:14 +08:00
sychen
efa22a4936 [CELEBORN-1105][FLINK] Support Flink 1.18
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```

```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
	at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
	at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```

Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/

Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers

`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```

<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">

Closes #2063 from cxzl25/CELEBORN-1105.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-06 15:53:39 +08:00
sychen
23d7c20f2f [CELEBORN-1031] SBT correct the LICENSE and NOTICE for shaded client jars
### What changes were proposed in this pull request?
Flink/Spark jars packaged with SBT use the correct LICENSE and NOTICE.

### Why are the changes needed?
https://github.com/apache/incubator-celeborn/pull/1930#discussion_r1340410526

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1967 from cxzl25/CELEBORN-1031.

Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-06 14:17:56 +08:00
sychen
6fa669748c [CELEBORN-999] MR deps check
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```
./dev/dependencies.sh  --module mr --check
./dev/dependencies.sh  --module mr --check --sbt
```

Closes #1928 from cxzl25/CELEBORN-999.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-11 13:56:31 +08:00
sychen
22f523537e [CELEBORN-1002] Add SBT MRClientProject
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
./build/make-distribution.sh --sbt-enabled -Pmr
```

```bash
./build/make-distribution.sh --sbt-enabled --release
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1930 from cxzl25/CELEBORN-1002.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-08 10:03:21 +08:00
Fu Chen
6b0addb934 [CELEBORN-989] Add support for making distribution package via SBT
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

Users have the capability to generate the binary distribution package using SBT by executing the following command:

```shell
./build/make-distribution.sh --sbt-enabled
```

### How was this patch tested?

Pass GA && locally tested.

Closes #1921 from cfmcgrady/sbt-make-dist-3.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-20 10:03:01 +08:00
sychen
beed2a85b0
[CELEBORN-977] Support RocksDB as recover DB backend
### What changes were proposed in this pull request?

### Why are the changes needed?

LevelDB does not support mac arm version.

```java
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8: dlopen(/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8, 0x0001): tried: '/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (fat file, but missing compatible architecture (have 'x86_64,i386', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (no such file), '/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (fat file, but missing compatible architecture (have 'x86_64,i386', need 'arm64'))]
  	at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
  	at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
  	at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
  	at org.apache.celeborn.service.deploy.worker.shuffledb.LevelDBProvider.initLevelDB(LevelDBProvider.java:49)
  	at org.apache.celeborn.service.deploy.worker.shuffledb.DBProvider.initDB(DBProvider.java:30)
  	at org.apache.celeborn.service.deploy.worker.storage.StorageManager.<init>(StorageManager.scala:197)
  	at org.apache.celeborn.service.deploy.worker.Worker.<init>(Worker.scala:109)
  	at org.apache.celeborn.service.deploy.worker.Worker$.main(Worker.scala:734)
  	at org.apache.celeborn.service.deploy.worker.Worker.main(Worker.scala)
```

The released `leveldbjni-all` for `org.fusesource.leveldbjni` does not support AArch64 Linux, we need to use `org.openlabtesting.leveldbjni`.

See https://issues.apache.org/jira/browse/HADOOP-16614

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
local test

Closes #1913 from cxzl25/CELEBORN-977.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-19 09:20:33 +08:00
zhouyifan279
9e01aac501
[CELEBORN-913] Implement method ShuffleDriverComponents#supportsReliableStorage
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-42689

### Does this PR introduce _any_ user-facing change?
Yes. User need to set `spark.shuffle.sort.io.plugin.class` to `org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO` to enable this feature.

### How was this patch tested?
Add a new matrix dimension, shuffle-plugin-class, in github ci, to run spark tests over `LocalDiskShuffleDataIO` and `CelebornShuffleDataIO` respectively.

Closes #1884 from zhouyifan279/spark-driver-component.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-07 16:25:09 +08:00
Fu Chen
142d12caa5 [CELEBORN-929][INFRA] Add dependencies check CI
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1852 from cfmcgrady/audit-deps-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-07 14:02:07 +08:00
zhouyifan279
10c63e0a0f [CELEBORN-919][FOLLOWUP] Add SBT project sparkColumnarShuffle to sparkGroup
### What changes were proposed in this pull request?
Add sbt project `sparkColumnarShuffle` to `sparkGroup`

### Why are the changes needed?
Add the project `sparkColumnarShuffle` to the spark tests group `sparkGroup` to enable the columnar-related tests for SBT.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Run tests locally.

Closes #1854 from zhouyifan279/columnar-shuffle-sbt.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-06 21:26:18 +08:00
zhouyifan279
d701d3ae2c [CELEBORN-912] Support build with Spark 3.5
### What changes were proposed in this pull request?

Support build with Spark 3.5

### Why are the changes needed?

Keep up with upstream.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Build with `mvn` and `sbt` locally.

Closes #1850 from zhouyifan279/build-spark-3.5.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-29 03:15:12 +00:00
zhouyifan279
2c07c55e77 [CELEBORN-919] Move Columnar Shuffle code into an individual module
### What changes were proposed in this pull request?

Move Columnar Shuffle code into an individual module

### Why are the changes needed?

Spark 3.5 made a lot of changes to AtomicType in https://issues.apache.org/jira/browse/SPARK-42887.

This causes compilation error when building columnar shuffle code.

As columnar shuffle is a configurable feature, I think it's better to move related code into a individual module. Then we can exclude this module when build with Spark 3.5 for now.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Add test `ColumnarHashBasedShuffleWriterSuiteJ` and `CelebornColumnarShuffleReaderSuite`

Closes #1843 from zhouyifan279/columnar-shuffle.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-28 12:19:28 +00:00
Fu Chen
5e3e9e442a
[CELEBORN-906][FOLLOWUP] Removal of redundant dependency log4j-slf4j2-impl from SBT profile spark-3.4
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

To address the CI failure introduced in https://github.com/apache/incubator-celeborn/pull/1831, this pull request resolves the issue by removing the `log4j-slf4j2-impl` dependency from SBT profile `spark-3.4`. This change is prompted by the pinning of `slf4j-api` to version 1.7.36, rendering `log4j-slf4j2-impl` unnecessary.

```
[error] Test org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ failed: java.lang.NoSuchMethodError: org.apache.logging.slf4j.Log4jLoggerFactory: method <init>()V not found, took 0.0 sec
[error]     at org.slf4j.impl.StaticLoggerBinder.<init>(StaticLoggerBinder.java:53)
[error]     at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:41)
[error]     at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
[error]     at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
[error]     at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:417)
[error]     at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:362)
[error]     at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:3[88](https://github.com/apache/incubator-celeborn/actions/runs/5974971986/job/16210071148#step:4:89))
[error]     at org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ.<clinit>(SortBasedPusherSuiteJ.java:51)
[error]     ...
[error] Test org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ failed: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ, took 0.0 sec
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     ...
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1844 from cfmcgrady/celeborn-906-followup.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-28 12:12:04 +08:00
Fu Chen
6d7c5c08ae [CELEBORN-906][BUILD] Aligning dependencies between SBT and Maven
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

This PR ensures dependency alignment between SBT and Maven, based on the audit results implemented in https://github.com/apache/incubator-celeborn/pull/1797

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA and Review

Closes #1831 from cfmcgrady/align-deps-2.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-26 16:06:47 +08:00
Xiduo You
34c8d164bd
[CELEBORN-921] Upgrade sbt to 1.9.4
### What changes were proposed in this pull request?

Upgrade sbt from 1.9.3 to 1.9.4

### Why are the changes needed?

Solves CVE and several issues.

<img width="1377" alt="image" src="https://github.com/apache/incubator-celeborn/assets/12025282/33710ee5-739a-4a11-8739-48f7898072bb">

Release note see: https://github.com/sbt/sbt/releases/tag/v1.9.4

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
PASS CI

Closes #1841 from ulysses-you/CELEBORN-921.

Authored-by: Xiduo You <ulyssesyou18@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-08-25 17:59:04 +08:00
Fu Chen
aa35c1cafc [CELEBORN-904] Bump Spark in spark-3.3 profile from 3.3.2 to 3.3.3
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

https://www.mail-archive.com/devspark.apache.org/msg30758.html

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1828 from cfmcgrady/spark33.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-23 20:20:22 +08:00
zwangsheng
479510cb9c [CELEBORN-888][WORKER] Tweak the logic and add unit tests for the MemoryManager#currentServingState method
### What changes were proposed in this pull request?
Tweak the logic of `MemoryManager#currentServingState`

Add Unit Test for this function

```mermaid
graph TB

A(Check Used Memory) --> B{Reach Pause Replicate Threshold}
B --> | N | C{Reach Pause Push Threshold}
B --> | Y | Z(Trigger Pause Push and Replicate)
C --> | N | D{Reach Resume Threshold}
C --> | Y | Y(Trigger Pause Push but Resume Replicate)
D --> | N | E{In Pause Mode}
D --> | Y | X(Trigger Resume Push and Replicate)
E --> | N | U(Do Nothing)
E --> | Y | Y
```
### Why are the changes needed?
Make this method logical, and add unit test to ensure logic won't be accidental modification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add Unit Test

Closes #1811 from zwangsheng/CELEBORN-888.

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-23 17:20:31 +08:00
Fu Chen
6af3b50508 [CELEBORN-884][BUILD] Consolidate all dependencies into a global object Dependencies
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

consolidate all sbt dependencies into a global object `Dependencies`, similar to Maven's dependencyManagement, to improve dependency management.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1802 from cfmcgrady/sbt-dependencies.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-08-16 10:42:59 +08:00
mingji
54622814cc
[CELEBORN-885][SPARK] Shade RoaringBitmap to avoid dependency conflicts
### What changes were proposed in this pull request?
Shade roaring bitmap to void dependency conflicts.

### Why are the changes needed?
Some user reports that celeborn client will introduce roaring bitmap conflicts.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #1803 from FMX/CELEBORN-885.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-08-11 12:48:12 +08:00
Fu Chen
0d1261632d [CELEBORN-880] Remove sbt compiler plugin genjavadoc-plugin
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

the plugin may generate unexpected source files in the project root directory. we need to refine this feature if we want to generate Java doc.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1799 from cfmcgrady/sbt-compiler-plugin.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-08-09 10:16:33 +08:00
Fu Chen
6ba4b7e138
[CELEBORN-850][INFRA] Add SBT CI
### What changes were proposed in this pull request?

This PR adds new GitHub Actions workflows to enable Continuous Integration using SBT based on #1764

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1771 from cfmcgrady/sbt-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-01 18:14:58 +08:00
Fu Chen
40e416c95c
[CELEBORN-843][BUILD] sbt support flink-related module build/test
### What changes were proposed in this pull request?

This PR adds packaging and testing support for Flink-related modules using SBT based on #1757

### Why are the changes needed?

improve project build speed

running flink-it tests with -Pflink-1.14

```shell
sbt:celeborn> project flink-it
sbt:flink-it> clean
sbt:flink-it> test
[success] Total time: 136 s (02:16), completed 2023-7-27 11:55:10
```

running flink-it tests with -Pflink-1.17

```shell
$ ./build/sbt -Pflink-1.17
sbt:celeborn> project flink-it
sbt:flink-it> clean
sbt:flink-it> test
[success] Total time: 168 s (02:48), completed 2023-7-27 11:28:35
```

packing and shading the flink 1.14 client

```shell
$ ./build/sbt -Pflink-1.14
sbt:celeborn> clean
sbt:celeborn> project celeborn-client-flink-1_14-shaded
sbt:celeborn-client-flink-1_14-shaded> assembly
[success] Total time: 35 s, completed 2023-7-27 11:51:54
```

packing and shading the flink 1.17 client

```shell
$ ./build/sbt -Pflink-1.17
sbt:celeborn> clean
sbt:celeborn> project celeborn-client-flink-1_17-shaded
sbt:celeborn-client-flink-1_17-shaded> assembly
[success] Total time: 39 s, completed 2023-7-27 11:49:20
```

### Does this PR introduce _any_ user-facing change?

yes

### How was this patch tested?

tested locally

Closes #1764 from cfmcgrady/sbt-flink.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-01 12:29:29 +08:00
Fu Chen
5f0295e9f3
[CELEBORN-836][BUILD] Initial support sbt
### What changes were proposed in this pull request?

This PR introduces the SBT build system implementation that operates independently from the current Maven build system. Different from https://github.com/apache/incubator-celeborn/pull/1627, the current implementation does not depend on `pom.xml`

The implementation enables packaging and testing functionalities for server-related modules and Spark-related modules using SBT.

For Flink-related build/test, sbt build documentation, continuous integration, and plugins, they will be submitted in separate PRs

### Why are the changes needed?

improve project build speed

packing the project.

```shell
$ ./build/sbt
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:36:12
sbt:celeborn> package
[success] Total time: 28 s, completed 2023-7-25 16:36:46
```

packing and shading the spark 3.3 client

```shell
$ ./build/sbt -Pspark-3.3
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:39:11
sbt:celeborn> project celeborn-client-spark-3-shaded
sbt:celeborn-client-spark-3-shaded> assembly
[success] Total time: 37 s, completed 2023-7-25 16:40:03
```

packing and shading the spark 2.4 client

```shell
$ ./build/sbt -Pspark-2.4
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:41:06
sbt:celeborn> project celeborn-client-spark-2-shaded
sbt:celeborn-client-spark-2-shaded> assembly
[success] Total time: 36 s, completed 2023-7-25 16:41:53
```

running server-related tests

```shell
$ ./build/sbt clean test
[success] Total time: 350 s (05:50), completed 2023-7-25 16:48:58
```

### Does this PR introduce _any_ user-facing change?

yes

### How was this patch tested?

tested locally

Closes #1757 from cfmcgrady/pure-sbt.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-28 10:40:04 +08:00