### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2073 from cxzl25/CELEBORN-856.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
Because now we support Flink 1.18.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2079 from cxzl25/CELEBORN-1108-FOLLOWOUP.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shaoyun Chen <csy@apache.org>
### What changes were proposed in this pull request?
### Why are the changes needed?
```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```
```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```
Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers
`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```
<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">
Closes#2063 from cxzl25/CELEBORN-1105.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2068 from cxzl25/CELEBORN-1108.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```
./dev/dependencies.sh --module mr --check
./dev/dependencies.sh --module mr --check --sbt
```
Closes#1928 from cxzl25/CELEBORN-999.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1929 from cxzl25/CELEBORN-1000.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
When executing the update script locally, it may generate such a Log, which causes awk to exit with an error.
```
Downloading from nexus: httpxxxx
```
```bash
./dev/dependencies.sh --replace
```
```
awk: trying to access out of range field -1
input record number 1, file
source line number 2
```
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1914 from cxzl25/CELEBORN-978.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Verified SBT CI job list.
<img width="843" alt="image" src="https://github.com/apache/incubator-celeborn/assets/88070094/2bbaf661-8f4d-4f3a-a7e4-242484fbd9a2">
Closes#1890 from zhouyifan279/sbt-ci.
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-42689
### Does this PR introduce _any_ user-facing change?
Yes. User need to set `spark.shuffle.sort.io.plugin.class` to `org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO` to enable this feature.
### How was this patch tested?
Add a new matrix dimension, shuffle-plugin-class, in github ci, to run spark tests over `LocalDiskShuffleDataIO` and `CelebornShuffleDataIO` respectively.
Closes#1884 from zhouyifan279/spark-driver-component.
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1852 from cfmcgrady/audit-deps-ci.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
Support build with Spark 3.5
### Why are the changes needed?
Keep up with upstream.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Build with `mvn` and `sbt` locally.
Closes#1850 from zhouyifan279/build-spark-3.5.
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
This PR introduces support for Scala 2.13
1. Resolved a compilation issue specific to Scala 2.13
2. Successfully validated compatibility with Scala 2.13 through the comprehensive suite of unit tests
3. Enabled SBT CI for Scala 2.13 within the "server" module and the "spark client"
For more detailed guidance on migrating to Scala 2.13, please consult the following resources:
1. https://www.scala-lang.org/blog/2017/02/28/collections-rework.html
2. https://docs.scala-lang.org/overviews/core/collections-migration-213.html
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1825 from cfmcgrady/scala213.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
Recently, I came across an issue in the SBT CI process that can result in failure due to the `NoClassDefFoundError` exception.
```
[error] Uncaught exception when running org.apache.celeborn.common.unsafe.PlatformUtilSuite: java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing
[error] sbt.ForkMain$ForkError: java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing
[error] at java.lang.ClassLoader.defineClass1(Native Method)
[error] at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
[error] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
[error] at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
[error] at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
[error] at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
[error] at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
[error] at java.security.AccessController.doPrivileged(Native Method)
[error] at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
[error] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
[error] at org.junit.runner.Computer.getSuite(Computer.java:28)
[error] at org.junit.runner.Request.classes(Request.java:77)
[error] at org.junit.runner.Request.classes(Request.java:92)
[error] at com.novocode.junit.JUnitTask.execute(JUnitTask.java:52)
[error] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:750)
[error] Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: org.hamcrest.SelfDescribing
[error] at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
[error] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
[error] at java.lang.ClassLoader.defineClass1(Native Method)
[error] at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
[error] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
[error] at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
[error] at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
[error] at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
[error] at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
[error] at java.security.AccessController.doPrivileged(Native Method)
[error] at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
```
Upon further investigation, I found that the root cause is SBT's sometimes inability to resolve Maven dependencies cached within GA.
```shell
./build/sbt "show celeborn-common/update"
```
```
[info] org.hamcrest:hamcrest-core:1.3:default: (MISSING) Artifact(hamcrest-core, jar, jar, None, Vector(), Some(file:/home/runner/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar), Map(), None, false)
```
This PR addresses the random issue by disabling the Maven cache for SBT CI.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
https://github.com/apache/incubator-celeborn/pull/1797 pass GA after disabled maven cache.
Closes#1818 from cfmcgrady/sbt-ci.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
This PR adds new GitHub Actions workflows to enable Continuous Integration using SBT based on #1764
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1771 from cfmcgrady/sbt-ci.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Enable CI for Celeborn Master/Worker and Client with Spark 3.3/3.4
### Why are the changes needed?
Ensure Celeborn works on Java 17.
Note: there may be some code paths that are not covered by tests, we should fix them in the future.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA
Closes#1649 from pan3793/CELEBORN-738.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Add Kubernetes Integration Test
- [x] test helm install deploy
- [ ] test shuffle
### Why are the changes needed?
Add integration test
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Ci test
Closes#1484 from zwangsheng/CELEBORN-105.
Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
1. Unregister shuffle triggers handleStageEnd
```
22/08/22 12:47:00 INFO LifecycleManager: Call StageEnd before Unregister Shuffle 60.
```
2. handleStageEnd success, maybe triggered by handleUnregisterShuffle or StageEnd
```
22/08/22 12:47:51 INFO LifecycleManager: Succeed to handle stageEnd for 60.
```
3. reports data lost
```
22/08/22 12:48:28 ERROR LifecycleManager: For 60 partition 2185-0: data lost.
22/08/22 12:48:28 ERROR LifecycleManager: Failed to handle stageEnd for 60, lost file!
```
4. report unregister success
```
22/08/22 12:48:28 INFO LifecycleManager: Unregister for 60 success.
```