Commit Graph

50 Commits

Author SHA1 Message Date
sychen
3054813a0f
[CELEBORN-856] Add mapreduce integration test
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2073 from cxzl25/CELEBORN-856.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-22 14:36:29 +08:00
sychen
208864a807
[CELEBORN-1108][FOLLOWUP] Use rat plugin check Flink 1.18
### What changes were proposed in this pull request?

### Why are the changes needed?
Because now we support Flink 1.18.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2079 from cxzl25/CELEBORN-1108-FOLLOWOUP.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shaoyun Chen <csy@apache.org>
2023-11-08 12:54:23 +08:00
sychen
efa22a4936 [CELEBORN-1105][FLINK] Support Flink 1.18
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```

```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
	at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
	at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```

Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/

Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers

`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```

<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">

Closes #2063 from cxzl25/CELEBORN-1105.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-06 15:53:39 +08:00
sychen
0e5008db19 [CELEBORN-1108] Rat plugin check for more modules
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2068 from cxzl25/CELEBORN-1108.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-03 14:03:08 +08:00
sychen
6fa669748c [CELEBORN-999] MR deps check
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```
./dev/dependencies.sh  --module mr --check
./dev/dependencies.sh  --module mr --check --sbt
```

Closes #1928 from cxzl25/CELEBORN-999.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-11 13:56:31 +08:00
sychen
8eba1b470e
[CELEBORN-1000] MR module style check
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1929 from cxzl25/CELEBORN-1000.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-09-20 16:54:42 +08:00
sychen
045c682c89 [CELEBORN-978] Improve dependency.sh replacement mode
### What changes were proposed in this pull request?

### Why are the changes needed?
When executing the update script locally, it may generate such a Log, which causes awk to exit with an error.
```
Downloading from nexus: httpxxxx
```

```bash
./dev/dependencies.sh --replace
```

```
awk: trying to access out of range field -1
 input record number 1, file
 source line number 2
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1914 from cxzl25/CELEBORN-978.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-16 09:35:13 +08:00
zhouyifan279
7ab674393b [CELEBORN-913][FOLLOWUP] Recover SBT CI jobs skipped due to last commit
### What changes were proposed in this pull request?
As title

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Verified SBT CI job list.

<img width="843" alt="image" src="https://github.com/apache/incubator-celeborn/assets/88070094/2bbaf661-8f4d-4f3a-a7e4-242484fbd9a2">

Closes #1890 from zhouyifan279/sbt-ci.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-09 02:12:09 +08:00
zhouyifan279
9e01aac501
[CELEBORN-913] Implement method ShuffleDriverComponents#supportsReliableStorage
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-42689

### Does this PR introduce _any_ user-facing change?
Yes. User need to set `spark.shuffle.sort.io.plugin.class` to `org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO` to enable this feature.

### How was this patch tested?
Add a new matrix dimension, shuffle-plugin-class, in github ci, to run spark tests over `LocalDiskShuffleDataIO` and `CelebornShuffleDataIO` respectively.

Closes #1884 from zhouyifan279/spark-driver-component.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-07 16:25:09 +08:00
Fu Chen
142d12caa5 [CELEBORN-929][INFRA] Add dependencies check CI
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1852 from cfmcgrady/audit-deps-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-07 14:02:07 +08:00
zhouyifan279
d701d3ae2c [CELEBORN-912] Support build with Spark 3.5
### What changes were proposed in this pull request?

Support build with Spark 3.5

### Why are the changes needed?

Keep up with upstream.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Build with `mvn` and `sbt` locally.

Closes #1850 from zhouyifan279/build-spark-3.5.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-29 03:15:12 +00:00
Fu Chen
d6c4334a11 [CELEBORN-901] Add support for Scala 2.13
### What changes were proposed in this pull request?

This PR introduces support for Scala 2.13

1. Resolved a compilation issue specific to Scala 2.13
2. Successfully validated compatibility with Scala 2.13 through the comprehensive suite of unit tests
3. Enabled SBT CI for Scala 2.13 within the "server" module and the "spark client"

For more detailed guidance on migrating to Scala 2.13, please consult the following resources:

1. https://www.scala-lang.org/blog/2017/02/28/collections-rework.html
2. https://docs.scala-lang.org/overviews/core/collections-migration-213.html

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1825 from cfmcgrady/scala213.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-22 20:35:05 +08:00
Fu Chen
4f47d0a56b [CELEBORN-898][INFRA] Fix java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing for SBT CI
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

Recently, I came across an issue in the SBT CI process that can result in failure due to the `NoClassDefFoundError` exception.

```
[error] Uncaught exception when running org.apache.celeborn.common.unsafe.PlatformUtilSuite: java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing
[error] sbt.ForkMain$ForkError: java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing
[error]    at java.lang.ClassLoader.defineClass1(Native Method)
[error]    at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
[error]    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
[error]    at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
[error]    at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
[error]    at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
[error]    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
[error]    at java.security.AccessController.doPrivileged(Native Method)
[error]    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
[error]    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
[error]    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
[error]    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
[error]    at org.junit.runner.Computer.getSuite(Computer.java:28)
[error]    at org.junit.runner.Request.classes(Request.java:77)
[error]    at org.junit.runner.Request.classes(Request.java:92)
[error]    at com.novocode.junit.JUnitTask.execute(JUnitTask.java:52)
[error]    at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[error]    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]    at java.lang.Thread.run(Thread.java:750)
[error] Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: org.hamcrest.SelfDescribing
[error]    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
[error]    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
[error]    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
[error]    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
[error]    at java.lang.ClassLoader.defineClass1(Native Method)
[error]    at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
[error]    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
[error]    at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
[error]    at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
[error]    at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
[error]    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
[error]    at java.security.AccessController.doPrivileged(Native Method)
[error]    at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
```

Upon further investigation, I found that the root cause is SBT's sometimes inability to resolve Maven dependencies cached within GA.

```shell
./build/sbt "show celeborn-common/update"
```

```
[info] 		org.hamcrest:hamcrest-core:1.3:default: (MISSING) Artifact(hamcrest-core, jar, jar, None, Vector(), Some(file:/home/runner/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar), Map(), None, false)
```

This PR addresses the random issue by disabling the Maven cache for SBT CI.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

https://github.com/apache/incubator-celeborn/pull/1797 pass GA after disabled maven cache.

Closes #1818 from cfmcgrady/sbt-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-08-16 11:16:21 +08:00
Fu Chen
6ba4b7e138
[CELEBORN-850][INFRA] Add SBT CI
### What changes were proposed in this pull request?

This PR adds new GitHub Actions workflows to enable Continuous Integration using SBT based on #1764

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1771 from cfmcgrady/sbt-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-01 18:14:58 +08:00
Cheng Pan
fd0cf11eca
[CELEBORN-738] Enable CI for Java 17
### What changes were proposed in this pull request?

Enable CI for Celeborn Master/Worker and Client with Spark 3.3/3.4

### Why are the changes needed?

Ensure Celeborn works on Java 17.

Note: there may be some code paths that are not covered by tests, we should fix them in the future.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA

Closes #1649 from pan3793/CELEBORN-738.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-29 13:47:55 +08:00
zwangsheng
5068d6e897
[CELEBORN-105][TEST] Kubernetes Integration Test
### What changes were proposed in this pull request?
Add Kubernetes Integration Test
- [x] test helm install deploy
- [ ] test shuffle

### Why are the changes needed?
Add integration test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ci test

Closes #1484 from zwangsheng/CELEBORN-105.

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-05 12:11:29 +08:00
Cheng Pan
ef8e556202
[CELEBORN-604][SPARK] Support Spark 3.4 (#1509) 2023-05-24 23:10:13 +08:00
Ethan Feng
114b1b4d62
[CELEBORN-548][FLINK] Support flink 1.17. (#1472) 2023-05-05 23:00:49 +08:00
Keyong Zhou
61416a828d
[CELEBORN-497]Fix and enable JDK 11 for CI (#1401) 2023-03-31 13:39:02 +08:00
Ethan Feng
9fc77980ba
[CELEBORN-380][FLINK] Enable flink integration test in GitHub CI. (#1312) 2023-03-08 10:52:14 +08:00
Cheng Pan
88976d9fd9
[CELEBORN-375][INFRA] Enable CI on branch-* (#1307) 2023-03-03 16:49:32 +08:00
Cheng Pan
0c29c5dd57
[CELEBORN-180][BUILD][FOLLOWUP] Update CI workflow and docs (#1134) 2023-01-03 17:58:51 +08:00
Cheng Pan
bf97a2227b
[CELEBORN-163][BUILD] Rename Flink modules and enable Flink CI (#1110) 2022-12-21 23:54:47 +08:00
Cheng Pan
7105f98829
[CELEBORN-160][BUILD] Spilt CI workflow (#1107) 2022-12-21 23:47:01 +08:00
Binjie Yang
853d0df191
[CELEBORN-149] Upload failure CI unit test logs for developer debug (#1094) 2022-12-16 04:31:13 +08:00
Cheng Pan
17f45453e3
[CELEBORN-135][INFRA] Remove unused GitHub slash workflow (#1080) 2022-12-14 10:13:13 +08:00
Cheng Pan
339d585469
[CELEBORN-134][INFRA] Enhance PULL_REQUEST_TEMPLATE (#1079) 2022-12-14 10:07:56 +08:00
Ethan Feng
e68954dbae
[CELEBORN-81] Add codecov. (#1026) 2022-12-09 12:11:45 +08:00
Cheng Pan
9bf4c65357
[CELEBORN-72][DOCS] Remove unused website resources from main repo (#1014) 2022-11-28 09:47:30 +08:00
Binjie Yang
f51fae6c75
[REFACTOR] Replace the missing Remote Shuffle Service (#885) 2022-10-28 17:37:59 +08:00
Cheng Pan
873eeeb1ed
[BUILD] Add apache- prefix in release tarball name (#854) 2022-10-25 22:39:48 +08:00
Ethan Feng
196f3800c2
Fix incorrect action command type configs. (#836) 2022-10-24 10:05:11 +08:00
Ethan Feng
77ff94ebf0
[FEATURE]Add action to process stale pull requests. (#817) 2022-10-21 10:39:51 +08:00
Ethan Feng
6f914997ed
[FEATURE]Add slash regression trigger. (#823) 2022-10-20 23:35:25 +08:00
Ethan Feng
b708d63678
Optimize regression logic. (#793) 2022-10-18 20:31:46 +08:00
Ethan Feng
b4b19892eb
[Feature]Add ci scripts for regression and benchmark. (#766) 2022-10-17 16:46:27 +08:00
Cheng Pan
2c82e98a94
Use Google mirror in GA (#755) 2022-10-11 11:57:41 +08:00
Cheng Pan
f2ca6d68e4
[DOCS] Build website (#579) 2022-09-10 00:45:13 +08:00
Cheng Pan
4b42219595
Remove log4j1 (#501) 2022-09-05 19:30:15 +08:00
Cheng Pan
c88ce306be
Use Spotless to auto check and reformat Java/Scala code (#497) 2022-09-01 21:19:56 +08:00
Cheng Pan
3dddb65f31
Enable Apache Rat and fix license header (#492) 2022-08-31 23:53:33 +08:00
Keyong Zhou
11762a260b
[ISSUE-417] handleUnregisterShuffle and StageEnd trigger double handl… (#420)
1. Unregister shuffle triggers handleStageEnd
```
22/08/22 12:47:00 INFO LifecycleManager: Call StageEnd before Unregister Shuffle 60.
```
2. handleStageEnd success, maybe triggered by handleUnregisterShuffle or StageEnd
```
22/08/22 12:47:51 INFO LifecycleManager: Succeed to handle stageEnd for 60.
```
3. reports data lost
```
22/08/22 12:48:28 ERROR LifecycleManager: For 60 partition 2185-0: data lost.
22/08/22 12:48:28 ERROR LifecycleManager: Failed to handle stageEnd for 60, lost file!
```
4. report unregister success
```
22/08/22 12:48:28 INFO LifecycleManager: Unregister for 60 success.
```
2022-08-22 17:13:08 +08:00
Cheng Pan
9b6ec58e2a
Add profile for Spark 3.2/3.3 (#380) 2022-08-17 22:27:43 +08:00
Cheng Pan
bb0c9b21fc
[ISSUE-350] Rewrite RssShuffleManager using Java to pass compile on Spark 3.1+ (#370) 2022-08-17 15:59:50 +08:00
Keyong Zhou
f0794dcb9a
Add DOC template (#373) 2022-08-17 11:36:43 +08:00
Cheng Pan
f1f4b894af
Build: Enhance build system (#349) 2022-08-15 14:59:01 +08:00
mingji
e43720f6a1 Add refactor requests template. 2022-07-08 17:12:16 +08:00
Ethan Feng
9ad8254b0a
AQE support. (#67) 2022-04-01 20:19:01 +08:00
wangshengjie123
b2a6091b55
[Feature] Make log4j2 as optional in case to we can update log4j2.xml to change log level (#56) 2022-03-08 22:33:06 +08:00
zky.zhoukeyong
ba5920acde Initial Commit for RSS 2021-12-28 20:57:35 +08:00