Commit Graph

110 Commits

Author SHA1 Message Date
SteNicholas
4dfcd9b56b [CELEBORN-1092] Introduce JVM monitoring in Celeborn Worker using JVMQuake
### What changes were proposed in this pull request?

Introduce JVM monitoring in Celeborn Worker using JVMQuake to enable early detection of memory management issues and facilitate fast failure.

### Why are the changes needed?

When facing out-of-control memory management in Celeborn Worker we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered. Therefore JVMQuake is introduced to provide more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure. Refers to the principle of [jvmquake](https://github.com/Netflix-Skunkworks/jvmquake) which is a JVMTI agent that attaches to your JVM and automatically signals and kills it when the program has become unstable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`JVMQuakeSuite`

Closes #2061 from SteNicholas/CELEBORN-1092.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-28 20:45:08 +08:00
sychen
3054813a0f
[CELEBORN-856] Add mapreduce integration test
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2073 from cxzl25/CELEBORN-856.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-22 14:36:29 +08:00
Fu Chen
aab073ab16
[CELEBORN-1125] Bump guava from 14.0.1 to 32.1.3-jre
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

- bump guava from 14.0.1 to 32.1.3-jre
- refer to https://github.com/apache/spark/pull/26911, remove usages of Guava that no longer work in Guava 27/32, and replace with workalikes. After this PR, Celeborn no longer relies on a specific version of Guava, and is compatible with Guava 14/27/32. we have the ability to specify Guava to 27 when running MapReduce integration tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2090 from cfmcgrady/guava-27.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-21 16:18:14 +08:00
sychen
efa22a4936 [CELEBORN-1105][FLINK] Support Flink 1.18
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```

```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
	at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
	at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```

Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/

Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers

`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```

<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">

Closes #2063 from cxzl25/CELEBORN-1105.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-06 15:53:39 +08:00
Mridul Muralidharan
52d6861eb7
[CELEBORN-1070] Add error-prone to pom.xml
### What changes were proposed in this pull request?

Add [error prone](https://errorprone.info/) to the build.
Error Prone is a static analysis tool that cam catch common bugs and mistakes during compilation.

### Why are the changes needed?
Catch potential issues during build

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Local build

Closes #2025 from mridulm/add-errorprone-to-pom.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Co-authored-by: Mridul Muralidharan <1591700+mridulm@users.noreply.github.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-10-24 11:44:41 +08:00
SteNicholas
c97628c510 [CELEBORN-987][DOC] README#Build should extend to Java8/11/17
### What changes were proposed in this pull request?

`README#Build` extends to Java8/11/17. Meanwhile, the profile of maven adds `jdk-17`.

### Why are the changes needed?

`README#Build` should extend to Java8/11/17. Meanwhile, the profile of maven should add jdk-17.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local maven compile.

Closes #1985 from SteNicholas/CELEBORN-987.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-10-12 21:58:32 +08:00
Mridul Muralidharan
3a41db360b
[CELEBORN-1006] Add support for Apache Hadoop 2.x in Celeborn build
Add support for Apache Hadoop 2.x in Celeborn build
Developers need to only specify their `hadoop.version`, and the build will pick the right profile internally based on the version to add the relevant dependencies.

[hadoop-client-api](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api) and [hadoop-client-runtime](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime) were introduced in hadoop 3.x, while hadoop 2.x had [hadoop-client](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client)
Celeborn depends on the former, and so requires hadoop 3.x to build.

Apache Spark dropped support for Hadoop 2.x only in the recent v3.5 ([SPARK-42452](https://issues.apache.org/jira/browse/SPARK-42452)). Given this, we have case where deployments on supported platforms like Spark 3.4 and older running on 2.x hadoop, will need to pull in hadoop 3.x just for Celeborn.

This PR uses `hadoop-client` when `hadoop.version` is specified as 2.x - and preserves existing behavior when `hadoop.version` is 3.x

Note - while using `hadoop-client` in 3.x is an option, hadoop community recommendation is to rely on `hadoop-client-api`/`hadoop-client-runtime`, hence making an effort to leverage that as much as possible.

Adds support for using 2.x for hadoop.version

Three combinations were tested:

* Default, without overriding hadoop.version

Dependencies:
```
$ build/mvn dependency:list 2>&1 | grep hadoop | sort | uniq
[INFO]    org.apache.hadoop:hadoop-client-api:jar:3.2.4:compile
[INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.2.4:compile
```

Will update this section again based on test suite results (which are ongoing)

* Setting hadoop.version to newer 3.3.0 explicitly

Dependencies:
```
$ ARGS="-Pspark-3.1 -Dhadoop.version=3.3.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | sort | uniq
[INFO]    org.apache.hadoop:hadoop-client-api:jar:3.3.0:compile
[INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.3.0:compile
```

* Setting hadoop.version to older 2.10.0

Dependencies:
```
$ ARGS="-Pspark-3.1 -Dhadoop.version=2.10.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | grep compile | sort | uniq
[INFO]    org.apache.hadoop:hadoop-auth:jar:2.10.0:compile -- module hadoop.auth (auto)
[INFO]    org.apache.hadoop:hadoop-client:jar:2.10.0:compile -- module hadoop.client (auto)
[INFO]    org.apache.hadoop:hadoop-common:jar:2.10.0:compile -- module hadoop.common (auto)
[INFO]    org.apache.hadoop:hadoop-hdfs-client:jar:2.10.0:compile -- module hadoop.hdfs.client (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.10.0:compile -- module hadoop.mapreduce.client.app (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.10.0:compile -- module hadoop.mapreduce.client.common (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.10.0:compile -- module hadoop.mapreduce.client.core (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.10.0:compile
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.10.0:compile -- module hadoop.mapreduce.client.shuffle (auto)
[INFO]    org.apache.hadoop:hadoop-yarn-api:jar:2.10.0:compile -- module hadoop.yarn.api (auto)
[INFO]    org.apache.hadoop:hadoop-yarn-common:jar:2.10.0:compile -- module hadoop.yarn.common (auto)
```

For each of the case above, build/test passes for each of the `ARGS`.

Closes #1936 from mridulm/main.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-25 20:15:02 +08:00
sychen
beed2a85b0
[CELEBORN-977] Support RocksDB as recover DB backend
### What changes were proposed in this pull request?

### Why are the changes needed?

LevelDB does not support mac arm version.

```java
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8: dlopen(/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8, 0x0001): tried: '/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (fat file, but missing compatible architecture (have 'x86_64,i386', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (no such file), '/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (fat file, but missing compatible architecture (have 'x86_64,i386', need 'arm64'))]
  	at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
  	at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
  	at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
  	at org.apache.celeborn.service.deploy.worker.shuffledb.LevelDBProvider.initLevelDB(LevelDBProvider.java:49)
  	at org.apache.celeborn.service.deploy.worker.shuffledb.DBProvider.initDB(DBProvider.java:30)
  	at org.apache.celeborn.service.deploy.worker.storage.StorageManager.<init>(StorageManager.scala:197)
  	at org.apache.celeborn.service.deploy.worker.Worker.<init>(Worker.scala:109)
  	at org.apache.celeborn.service.deploy.worker.Worker$.main(Worker.scala:734)
  	at org.apache.celeborn.service.deploy.worker.Worker.main(Worker.scala)
```

The released `leveldbjni-all` for `org.fusesource.leveldbjni` does not support AArch64 Linux, we need to use `org.openlabtesting.leveldbjni`.

See https://issues.apache.org/jira/browse/HADOOP-16614

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
local test

Closes #1913 from cxzl25/CELEBORN-977.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-19 09:20:33 +08:00
mingji
e0c00ecd38 [CELEBORN-839][MR] Support Hadoop MapReduce
### What changes were proposed in this pull request?
1. Map side merge and push.
2. Support hadoop2 & 3.
3. Reduce in-memory merge.
4. Integrate LifecycleManager to RmApplicationMaster.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster.

I tested this PR on a cluster with a 4x 16 CPU 64G Mem 4ESSD cluster.
Hadoop 2.8.5

1TB Terasort, 8400 mappers, 1000 reducers
Celeborn 81min vs MR shuffle 89min
![mr1](https://github.com/apache/incubator-celeborn/assets/4150993/a3cf6493-b6ff-4c03-9936-4558cf22761d)
![mr2](https://github.com/apache/incubator-celeborn/assets/4150993/9119ffb4-6996-4b77-bcdf-cbd6db5c096f)

1GB wordcount, 8 mappers, 8 reducers
Celeborn 35s VS MR shuffle 38s
![mr3](https://github.com/apache/incubator-celeborn/assets/4150993/907dce24-16b7-4788-ab5d-5b784fd07d47)
![mr4](https://github.com/apache/incubator-celeborn/assets/4150993/8e8065b9-6c46-4c8d-9e71-45eed8e63877)

Closes #1830 from FMX/CELEBORN-839.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-14 14:12:53 +08:00
zhouyifan279
9e01aac501
[CELEBORN-913] Implement method ShuffleDriverComponents#supportsReliableStorage
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-42689

### Does this PR introduce _any_ user-facing change?
Yes. User need to set `spark.shuffle.sort.io.plugin.class` to `org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO` to enable this feature.

### How was this patch tested?
Add a new matrix dimension, shuffle-plugin-class, in github ci, to run spark tests over `LocalDiskShuffleDataIO` and `CelebornShuffleDataIO` respectively.

Closes #1884 from zhouyifan279/spark-driver-component.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-07 16:25:09 +08:00
Fu Chen
142d12caa5 [CELEBORN-929][INFRA] Add dependencies check CI
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1852 from cfmcgrady/audit-deps-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-07 14:02:07 +08:00
zhouyifan279
d701d3ae2c [CELEBORN-912] Support build with Spark 3.5
### What changes were proposed in this pull request?

Support build with Spark 3.5

### Why are the changes needed?

Keep up with upstream.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Build with `mvn` and `sbt` locally.

Closes #1850 from zhouyifan279/build-spark-3.5.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-29 03:15:12 +00:00
zhouyifan279
2c07c55e77 [CELEBORN-919] Move Columnar Shuffle code into an individual module
### What changes were proposed in this pull request?

Move Columnar Shuffle code into an individual module

### Why are the changes needed?

Spark 3.5 made a lot of changes to AtomicType in https://issues.apache.org/jira/browse/SPARK-42887.

This causes compilation error when building columnar shuffle code.

As columnar shuffle is a configurable feature, I think it's better to move related code into a individual module. Then we can exclude this module when build with Spark 3.5 for now.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Add test `ColumnarHashBasedShuffleWriterSuiteJ` and `CelebornColumnarShuffleReaderSuite`

Closes #1843 from zhouyifan279/columnar-shuffle.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-28 12:19:28 +00:00
Fu Chen
6d7c5c08ae [CELEBORN-906][BUILD] Aligning dependencies between SBT and Maven
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

This PR ensures dependency alignment between SBT and Maven, based on the audit results implemented in https://github.com/apache/incubator-celeborn/pull/1797

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA and Review

Closes #1831 from cfmcgrady/align-deps-2.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-26 16:06:47 +08:00
Fu Chen
49b6b10d5e [CELEBORN-879] Add dev/dependencies.sh for audit dependencies
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1797 from cfmcgrady/audit-deps.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-26 15:59:20 +08:00
Fu Chen
aa35c1cafc [CELEBORN-904] Bump Spark in spark-3.3 profile from 3.3.2 to 3.3.3
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

https://www.mail-archive.com/devspark.apache.org/msg30758.html

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1828 from cfmcgrady/spark33.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-23 20:20:22 +08:00
Fu Chen
e16b26762b
[CELEBORN-837][BUILD] Add silencer plugin to suppress deprecated warnings
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

to suppress all warnings related to deprecations during the compilation process.

to fix
```
class OpenStream in package protocol is deprecated
        val openStream = msg.asInstanceOf[OpenStream]
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

tested locally

Closes #1760 from cfmcgrady/silence-deprecated.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-25 21:14:45 +08:00
Fu Chen
0bb73ece3b [CELEBORN-821][BUILD] Bump junit from 4.12 to 4.13.2
### What changes were proposed in this pull request?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1744 from cfmcgrady/junit.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-07-22 10:00:25 +08:00
Cheng Pan
5b3f43dffc
[CELEBORN-763] Add --add-opens to bootstrap shell scripts
### What changes were proposed in this pull request?

Add --add-opens to bootstrap shell scripts

### Why are the changes needed?

Additional `--add-opens` is required for Java 17, notes, the `--add-opens` list is copied from Spark and was used for UT, I am not sure each of them is required but at least the UT passed with them.

Details supplied by cfmcgrady

[JEP 403](https://openjdk.java.net/jeps/403) targeted for [JDK 17](https://openjdk.java.net/projects/jdk/17/) will remove `--illegal-access` flag. That will be equivalent to `--illegal-access=deny`.

this means using reflection to invoke protected methods of exported `java.*` APIs will no longer work. For example:

```shell
> /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell
|  欢迎使用 JShell -- 版本 17.0.7
|  要大致了解该版本, 请键入: /help intro

jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]

jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);
|  异常错误 java.lang.reflect.InaccessibleObjectException:Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module 34c45dca
|        at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:354)
|        at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297)
|        at Constructor.checkCanSetAccessible (Constructor.java:188)
|        at Constructor.setAccessible (Constructor.java:181)
|        at (#2:1)

jshell>

```

```shell
>  /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell -R --add-opens=java.base/java.nio=ALL-UNNAMED
|  欢迎使用 JShell -- 版本 17.0.7
|  要大致了解该版本, 请键入: /help intro

jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]

jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);

jshell>
```

### Does this PR introduce _any_ user-facing change?

Yes, for Java 17 support.

### How was this patch tested?

CI and review

Closes #1677 from pan3793/CELEBORN-763.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-05 11:31:21 +08:00
Cheng Pan
b308ac6717
[CELEBORN-742][BUILD] Bump Hadoop 3.2.4
### What changes were proposed in this pull request?

Bump Hadoop from 3.2.1 to 3.2.4.

### Why are the changes needed?

Always use the latest patched version.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

Closes #1654 from pan3793/CELEBORN-742.

Lead-authored-by: Cheng Pan <chengpan@apache.org>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-06-29 11:48:45 +08:00
Cheng Pan
78327ebd4a [CELEBORN-743][BUILD] Bump commons-io to 2.13.0
### What changes were proposed in this pull request?

Bump commons-io to 2.13.0

### Why are the changes needed?

- https://commons.apache.org/proper/commons-io/changes-report.html#a2.9.0
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.10.0
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.11.0
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.12.0
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.13.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

Closes #1655 from pan3793/CELEBORN-743.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-29 10:26:57 +08:00
Cheng Pan
3c6d90b5e5 [CELEBORN-741][BUILD] Bump Spark to latest patched version
### What changes were proposed in this pull request?

Bump Spark

- from 3.2.2 to 3.2.4
- from 3.3.1 to 3.3.2
- from 3.4.0 to 3.4.1

### Why are the changes needed?

Keep Spark version update-to-date

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

Closes #1653 from pan3793/CELEBORN-741.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-29 10:23:40 +08:00
Cheng Pan
c2352a2f9f [CELEBORN-736][BUILD] Bump commons-lang3 3.12.0
### What changes were proposed in this pull request?

Bump commons-lang3 to latest version

### Why are the changes needed?

- https://commons.apache.org/proper/commons-lang/changes-report.html#a3.11
- https://commons.apache.org/proper/commons-lang/changes-report.html#a3.12.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

Closes #1648 from pan3793/CELEBORN-736.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-28 21:15:44 +08:00
Cheng Pan
98744fb8ca [CELEBORN-705][BUILD] Upgrade Maven from 3.6.3 to 3.8.8
### What changes were proposed in this pull request?

Upgrade Maven from 3.6.3 to 3.8.8.

### Why are the changes needed?

Maven 3.6.3 is EOL. It was removed from the Apache Mirror site, so users can not benefit from download speedup from the mirror even with
```
export APACHE_MIRROR=https://mirrors.cloud.tencent.com/apache
```

https://mirrors.cloud.tencent.com/apache/maven/maven-3/

<img width="752" alt="image" src="https://github.com/apache/incubator-celeborn/assets/26535726/80e9e472-15c6-419e-a29b-69661615a16f">

There are logs from our CI server, it can not download from the mirror site and have to fallback to the Apache archive server, the latter is extremely slow.
```
$ ./build/mvn $MVN_OPTS $BUILD_PROFILES -version
Falling back to archive.apache.org to download Maven
...
```

Why not 3.9.2?

Maven 3.9 uses native transport-http as default and the default timeout is 10000ms, which is shorter than Wagon's default timeout 60000ms, which causes a lot of network timeout issues

See details at https://github.com/apache/spark/pull/40738

### Does this PR introduce _any_ user-facing change?

Maybe, if the user uses insecure http private repo in their `pom.xml`. Because [Maven 3.8 enforces the https in default](https://maven.apache.org/docs/3.8.1/release-notes.html#cve-2021-26291).

As a workaround, you can leverage `sed` to remove such restrictions.
```
$ build/mvn -version
$ sed -i "s/<mirrorOf>external:http:\*/<mirrorOf>dummy/g" build/apache-maven-*/conf/settings.xml
...
```

### How was this patch tested?

Pass GA.

Closes #1615 from pan3793/CELEBORN-705.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-21 21:54:17 +08:00
sychen
e734ceb558 [MINOR] Cleanup code
### What changes were proposed in this pull request?
1. Use `<arg>-Ywarn-unused-import</arg>` to remove some unused imports
There is no way to use `<arg>-Ywarn-unused-import</arg>` at this stage
Because we have the following code
```
// Can Remove this if celeborn don't support scala211 in future
import org.apache.celeborn.common.util.FunctionConverter._
```
2. Fix scala case match not fully covered, avoid `scala.MatchError`
3. Fixed some scala compilation warnings

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1600 from cxzl25/cleanup_code.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-19 11:31:51 +08:00
zwangsheng
7d7107d607 [CELEBORN-684] Upgrade Netty from 4.1.92.Final to 4.1.93.Final
### What changes were proposed in this pull request?

After `Netty` release `4.1.39.Final` for 3 weeks ago, we should update netty version.

[Change List](https://github.com/netty/netty/compare/netty-4.1.92.Final...netty-4.1.93.Final)

### Why are the changes needed?

Catch up with the Netty version

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

Closes #1596 from zwangsheng/CELEBORN-684.

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-16 20:05:25 +08:00
Fu Chen
3fb896b11f [CELEBORN-666] Define protobuf-maven-plugin in the root pom.xml
### What changes were proposed in this pull request?

Define `protobuf-maven-plugin` in the root pom.xml

### Why are the changes needed?

to fix

```bash
build/mvn protobuf:compile -am -pl common
```

```
[ERROR] No plugin found for prefix 'protobuf' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/Users/fchen/.m2/repository), apache.snapshots (https://repository.apache.org/snapshots), central (https://repo.maven.apache.org/maven2)] -> [Help 1]
org.apache.maven.plugin.prefix.NoPluginFoundForPrefixException: No plugin found for prefix 'protobuf' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/Users/fchen/.m2/repository), apache.snapshots (https://repository.apache.org/snapshots), central (https://repo.maven.apache.org/maven2)]
    at org.apache.maven.plugin.prefix.internal.DefaultPluginPrefixResolver.resolve (DefaultPluginPrefixResolver.java:95)
    at org.apache.maven.lifecycle.internal.MojoDescriptorCreator.findPluginForPrefix (MojoDescriptorCreator.java:266)
    at org.apache.maven.lifecycle.internal.MojoDescriptorCreator.getMojoDescriptor (MojoDescriptorCreator.java:220)
    at org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments (DefaultLifecycleTaskSegmentCalculator.java:104)
    at org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments (DefaultLifecycleTaskSegmentCalculator.java:83)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:89)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:298)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoPluginFoundForPrefixException
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

tested locally.

Closes #1579 from cfmcgrady/protobuf-plugin.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-12 19:46:46 +08:00
Cheng Pan
76533d7324
[CELEBORN-650][TEST] Upgrade scalatest and unify mockito version
### What changes were proposed in this pull request?

This PR upgrades

- `mockito` from 1.10.19 and 3.6.0 to 4.11.0
- `scalatest` from 3.2.3 to 3.2.16
- `mockito-scalatest` from 1.16.37 to 1.17.14

### Why are the changes needed?

Housekeeping, making test dependencies up-to-date and unified.

### Does this PR introduce _any_ user-facing change?

No, it only affects test.

### How was this patch tested?

Pass GA.

Closes #1562 from pan3793/CELEBORN-650.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-09 10:04:14 +08:00
Shuang
2711b8253a
[CELEBORN-641] Upgrade flink scala version to 2.12.15
### What changes were proposed in this pull request?
Use scala 2.12.15 as default scala version for flink.

### Why are the changes needed?
There is incompatible serialize problem between scala 2.12.7 to scala 2.12.15/scala 2.11.12,  when use different scala version, the generated serialVersionUID is different, Then we may encounter deserialize problem between client/server rpc, refer [scala ](https://users.scala-lang.org/t/serialversionuid-change-between-scala-2-12-6-and-2-12-7/3478/3)

![image](https://github.com/apache/incubator-celeborn/assets/28799061/19ddd25e-7db5-458d-95d0-bc6ab66cd40b)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test use Flink scala2.12.7 runtime with Celeborn scala 2.12.15 compiled Flink client

Closes #1553 from RexXiong/CELEBORN-641.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: Ethan Feng <ethanfeng@apache.org>
2023-06-07 20:46:10 +08:00
zhongqiangchen
98676cf79b
[CELEBORN-635] Exclude netty-handler-ssl-ocsp from netty dependency (#1544) 2023-06-05 13:54:33 +08:00
zwangsheng
5068d6e897
[CELEBORN-105][TEST] Kubernetes Integration Test
### What changes were proposed in this pull request?
Add Kubernetes Integration Test
- [x] test helm install deploy
- [ ] test shuffle

### Why are the changes needed?
Add integration test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ci test

Closes #1484 from zwangsheng/CELEBORN-105.

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-05 12:11:29 +08:00
Cheng Pan
c29f2f0aa8
[CELEBORN-605][BUILD] Remove redundant exclusions from hadoop-client-api (#1510) 2023-05-25 10:40:15 +08:00
Cheng Pan
ef8e556202
[CELEBORN-604][SPARK] Support Spark 3.4 (#1509) 2023-05-24 23:10:13 +08:00
zhongqiangchen
e6978c380b
[CELEBORN-603] Update version to 0.4.0-SNAPSHOT (#1507) 2023-05-24 14:31:10 +08:00
Kaijie Chen
67bc420801
[CELEBORN-558] Bump Ratis to 2.5.1 and fix API changes (#1464) 2023-05-18 11:08:37 +08:00
Ethan Feng
114b1b4d62
[CELEBORN-548][FLINK] Support flink 1.17. (#1472) 2023-05-05 23:00:49 +08:00
Ethan Feng
e24569cbb7
[CELEBORN-569] Update netty version to 4.1.92. (#1476) 2023-05-05 20:01:37 +08:00
Ethan Feng
596d276323
Revert "[CELEBORN-569] Update netty version to 4.1.92."
This reverts commit a95936906b.
2023-05-05 12:34:37 +08:00
Ethan Feng
a95936906b
[CELEBORN-569] Update netty version to 4.1.92. 2023-05-05 12:30:01 +08:00
Ethan Feng
93d2f106e0
[CELEBORN-548][FLINK] Support flink 1.15. (#1463) 2023-05-04 15:23:59 +08:00
Keyong Zhou
61416a828d
[CELEBORN-497]Fix and enable JDK 11 for CI (#1401) 2023-03-31 13:39:02 +08:00
Ethan Feng
d58feef3a7
[CELEBORN-280] Enable Jacoco multi-module mode to collect coverage report. (#1215) 2023-03-30 14:24:42 +08:00
CVEDetect
7001f10461
[CELEBORN-482] Fix CVE dependency issue 2023-03-27 22:47:03 +08:00
Angerszhuuuu
ca79a9ce31
[CELEBORN-359][FOLLOWUP] Update LICENSE and NOTICE for ratis-shell (#1298)
* [CELEBORN-359][FOLLOWUP] Update LICENSE and NOTICE for ratis-shell
2023-03-02 15:48:23 +08:00
Angerszhuuuu
4c90e0b02a
[CELEBORN-359] Add ratis-shell to celeborn (#1292) 2023-03-01 17:04:57 +08:00
Ethan Feng
328a6ff2f5
Revert "[CELEBORN-355][BUILD] Create shaded module for Celeborn common (#1290)" (#1293)
This reverts commit 725028a10a.
2023-03-01 16:59:02 +08:00
Kerwin Zhang
725028a10a
[CELEBORN-355][BUILD] Create shaded module for Celeborn common (#1290) 2023-03-01 15:29:45 +08:00
Ethan Feng
e9b33751d3
[CELEBORN-289] Add flink integration test module. (#1229) 2023-02-21 12:25:23 +08:00
Cheng Pan
799e13d450
[CELEBORN-171][FOLLOWUP] Auto activation jdk-8 profile (#1191) 2023-01-31 19:51:33 +08:00
Ethan Feng
e219e8b44e
[CELEBORN-171] Support JDK11. (#1169) 2023-01-16 19:35:54 +08:00