Commit Graph

25 Commits

Author SHA1 Message Date
SteNicholas
4dfcd9b56b [CELEBORN-1092] Introduce JVM monitoring in Celeborn Worker using JVMQuake
### What changes were proposed in this pull request?

Introduce JVM monitoring in Celeborn Worker using JVMQuake to enable early detection of memory management issues and facilitate fast failure.

### Why are the changes needed?

When facing out-of-control memory management in Celeborn Worker we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered. Therefore JVMQuake is introduced to provide more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure. Refers to the principle of [jvmquake](https://github.com/Netflix-Skunkworks/jvmquake) which is a JVMTI agent that attaches to your JVM and automatically signals and kills it when the program has become unstable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`JVMQuakeSuite`

Closes #2061 from SteNicholas/CELEBORN-1092.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-28 20:45:08 +08:00
onebox-li
927d62425b
[CELEBORN-1125][FOLLOWUP] Add failureaccess shade
### What changes were proposed in this pull request?
Add failureaccess shade.

### Why are the changes needed?
When test main branch, client got error like below:
```
Caused by: java.lang.NoClassDefFoundError: org/apache/celeborn/shaded/com/google/common/util/concurrent/internal/InternalFutureFailureAccess
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3517)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3521)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2170)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2081)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:4019)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4933)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.replyGetReducerFileGroup(ReducePartitionCommitHandler.scala:283)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.handleGetReducerFileGroup(ReducePartitionCommitHandler.scala:300)
	at org.apache.celeborn.client.CommitManager.handleGetReducerFileGroup(CommitManager.scala:266)
	at org.apache.celeborn.client.LifecycleManager.org$apache$celeborn$client$LifecycleManager$$handleGetReducerFileGroup(LifecycleManager.scala:628)
	at org.apache.celeborn.client.LifecycleManager$$anonfun$receiveAndReply$1.applyOrElse(LifecycleManager.scala:314)
	at org.apache.celeborn.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
	at org.apache.celeborn.common.rpc.netty.Inbox.safelyCall(Inbox.scala:222)
	at org.apache.celeborn.common.rpc.netty.Inbox.process(Inbox.scala:110)
	at org.apache.celeborn.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #2116 from onebox-li/shade-add-failureaccess.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-27 18:01:38 +08:00
sychen
3054813a0f
[CELEBORN-856] Add mapreduce integration test
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2073 from cxzl25/CELEBORN-856.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-22 14:36:29 +08:00
Fu Chen
aab073ab16
[CELEBORN-1125] Bump guava from 14.0.1 to 32.1.3-jre
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

- bump guava from 14.0.1 to 32.1.3-jre
- refer to https://github.com/apache/spark/pull/26911, remove usages of Guava that no longer work in Guava 27/32, and replace with workalikes. After this PR, Celeborn no longer relies on a specific version of Guava, and is compatible with Guava 14/27/32. we have the ability to specify Guava to 27 when running MapReduce integration tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2090 from cfmcgrady/guava-27.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-21 16:18:14 +08:00
sychen
efa22a4936 [CELEBORN-1105][FLINK] Support Flink 1.18
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```

```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
	at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
	at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
	at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```

Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/

Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers

`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```

<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">

Closes #2063 from cxzl25/CELEBORN-1105.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-06 15:53:39 +08:00
sychen
23d7c20f2f [CELEBORN-1031] SBT correct the LICENSE and NOTICE for shaded client jars
### What changes were proposed in this pull request?
Flink/Spark jars packaged with SBT use the correct LICENSE and NOTICE.

### Why are the changes needed?
https://github.com/apache/incubator-celeborn/pull/1930#discussion_r1340410526

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1967 from cxzl25/CELEBORN-1031.

Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-06 14:17:56 +08:00
sychen
6fa669748c [CELEBORN-999] MR deps check
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
```
./dev/dependencies.sh  --module mr --check
./dev/dependencies.sh  --module mr --check --sbt
```

Closes #1928 from cxzl25/CELEBORN-999.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-11 13:56:31 +08:00
sychen
22f523537e [CELEBORN-1002] Add SBT MRClientProject
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
./build/make-distribution.sh --sbt-enabled -Pmr
```

```bash
./build/make-distribution.sh --sbt-enabled --release
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1930 from cxzl25/CELEBORN-1002.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-08 10:03:21 +08:00
Fu Chen
6b0addb934 [CELEBORN-989] Add support for making distribution package via SBT
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

Users have the capability to generate the binary distribution package using SBT by executing the following command:

```shell
./build/make-distribution.sh --sbt-enabled
```

### How was this patch tested?

Pass GA && locally tested.

Closes #1921 from cfmcgrady/sbt-make-dist-3.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-20 10:03:01 +08:00
sychen
beed2a85b0
[CELEBORN-977] Support RocksDB as recover DB backend
### What changes were proposed in this pull request?

### Why are the changes needed?

LevelDB does not support mac arm version.

```java
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8: dlopen(/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8, 0x0001): tried: '/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (fat file, but missing compatible architecture (have 'x86_64,i386', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (no such file), '/private/var/folders/tc/r2n_8g6j4731h7clfqwntg880000gn/T/libleveldbjni-64-1-4616234670453989010.8' (fat file, but missing compatible architecture (have 'x86_64,i386', need 'arm64'))]
  	at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
  	at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
  	at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
  	at org.apache.celeborn.service.deploy.worker.shuffledb.LevelDBProvider.initLevelDB(LevelDBProvider.java:49)
  	at org.apache.celeborn.service.deploy.worker.shuffledb.DBProvider.initDB(DBProvider.java:30)
  	at org.apache.celeborn.service.deploy.worker.storage.StorageManager.<init>(StorageManager.scala:197)
  	at org.apache.celeborn.service.deploy.worker.Worker.<init>(Worker.scala:109)
  	at org.apache.celeborn.service.deploy.worker.Worker$.main(Worker.scala:734)
  	at org.apache.celeborn.service.deploy.worker.Worker.main(Worker.scala)
```

The released `leveldbjni-all` for `org.fusesource.leveldbjni` does not support AArch64 Linux, we need to use `org.openlabtesting.leveldbjni`.

See https://issues.apache.org/jira/browse/HADOOP-16614

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
local test

Closes #1913 from cxzl25/CELEBORN-977.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-19 09:20:33 +08:00
zhouyifan279
9e01aac501
[CELEBORN-913] Implement method ShuffleDriverComponents#supportsReliableStorage
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
See https://issues.apache.org/jira/browse/SPARK-42689

### Does this PR introduce _any_ user-facing change?
Yes. User need to set `spark.shuffle.sort.io.plugin.class` to `org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO` to enable this feature.

### How was this patch tested?
Add a new matrix dimension, shuffle-plugin-class, in github ci, to run spark tests over `LocalDiskShuffleDataIO` and `CelebornShuffleDataIO` respectively.

Closes #1884 from zhouyifan279/spark-driver-component.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-07 16:25:09 +08:00
Fu Chen
142d12caa5 [CELEBORN-929][INFRA] Add dependencies check CI
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

As title

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1852 from cfmcgrady/audit-deps-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-09-07 14:02:07 +08:00
zhouyifan279
10c63e0a0f [CELEBORN-919][FOLLOWUP] Add SBT project sparkColumnarShuffle to sparkGroup
### What changes were proposed in this pull request?
Add sbt project `sparkColumnarShuffle` to `sparkGroup`

### Why are the changes needed?
Add the project `sparkColumnarShuffle` to the spark tests group `sparkGroup` to enable the columnar-related tests for SBT.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Run tests locally.

Closes #1854 from zhouyifan279/columnar-shuffle-sbt.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-06 21:26:18 +08:00
zhouyifan279
d701d3ae2c [CELEBORN-912] Support build with Spark 3.5
### What changes were proposed in this pull request?

Support build with Spark 3.5

### Why are the changes needed?

Keep up with upstream.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Build with `mvn` and `sbt` locally.

Closes #1850 from zhouyifan279/build-spark-3.5.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-29 03:15:12 +00:00
zhouyifan279
2c07c55e77 [CELEBORN-919] Move Columnar Shuffle code into an individual module
### What changes were proposed in this pull request?

Move Columnar Shuffle code into an individual module

### Why are the changes needed?

Spark 3.5 made a lot of changes to AtomicType in https://issues.apache.org/jira/browse/SPARK-42887.

This causes compilation error when building columnar shuffle code.

As columnar shuffle is a configurable feature, I think it's better to move related code into a individual module. Then we can exclude this module when build with Spark 3.5 for now.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Add test `ColumnarHashBasedShuffleWriterSuiteJ` and `CelebornColumnarShuffleReaderSuite`

Closes #1843 from zhouyifan279/columnar-shuffle.

Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-28 12:19:28 +00:00
Fu Chen
5e3e9e442a
[CELEBORN-906][FOLLOWUP] Removal of redundant dependency log4j-slf4j2-impl from SBT profile spark-3.4
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

To address the CI failure introduced in https://github.com/apache/incubator-celeborn/pull/1831, this pull request resolves the issue by removing the `log4j-slf4j2-impl` dependency from SBT profile `spark-3.4`. This change is prompted by the pinning of `slf4j-api` to version 1.7.36, rendering `log4j-slf4j2-impl` unnecessary.

```
[error] Test org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ failed: java.lang.NoSuchMethodError: org.apache.logging.slf4j.Log4jLoggerFactory: method <init>()V not found, took 0.0 sec
[error]     at org.slf4j.impl.StaticLoggerBinder.<init>(StaticLoggerBinder.java:53)
[error]     at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:41)
[error]     at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
[error]     at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
[error]     at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:417)
[error]     at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:362)
[error]     at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:3[88](https://github.com/apache/incubator-celeborn/actions/runs/5974971986/job/16210071148#step:4:89))
[error]     at org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ.<clinit>(SortBasedPusherSuiteJ.java:51)
[error]     ...
[error] Test org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ failed: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.shuffle.celeborn.SortBasedPusherSuiteJ, took 0.0 sec
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     ...
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1844 from cfmcgrady/celeborn-906-followup.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-28 12:12:04 +08:00
Fu Chen
6d7c5c08ae [CELEBORN-906][BUILD] Aligning dependencies between SBT and Maven
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

This PR ensures dependency alignment between SBT and Maven, based on the audit results implemented in https://github.com/apache/incubator-celeborn/pull/1797

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA and Review

Closes #1831 from cfmcgrady/align-deps-2.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-26 16:06:47 +08:00
Fu Chen
aa35c1cafc [CELEBORN-904] Bump Spark in spark-3.3 profile from 3.3.2 to 3.3.3
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

https://www.mail-archive.com/devspark.apache.org/msg30758.html

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1828 from cfmcgrady/spark33.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-23 20:20:22 +08:00
zwangsheng
479510cb9c [CELEBORN-888][WORKER] Tweak the logic and add unit tests for the MemoryManager#currentServingState method
### What changes were proposed in this pull request?
Tweak the logic of `MemoryManager#currentServingState`

Add Unit Test for this function

```mermaid
graph TB

A(Check Used Memory) --> B{Reach Pause Replicate Threshold}
B --> | N | C{Reach Pause Push Threshold}
B --> | Y | Z(Trigger Pause Push and Replicate)
C --> | N | D{Reach Resume Threshold}
C --> | Y | Y(Trigger Pause Push but Resume Replicate)
D --> | N | E{In Pause Mode}
D --> | Y | X(Trigger Resume Push and Replicate)
E --> | N | U(Do Nothing)
E --> | Y | Y
```
### Why are the changes needed?
Make this method logical, and add unit test to ensure logic won't be accidental modification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add Unit Test

Closes #1811 from zwangsheng/CELEBORN-888.

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-23 17:20:31 +08:00
Fu Chen
6af3b50508 [CELEBORN-884][BUILD] Consolidate all dependencies into a global object Dependencies
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

consolidate all sbt dependencies into a global object `Dependencies`, similar to Maven's dependencyManagement, to improve dependency management.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1802 from cfmcgrady/sbt-dependencies.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-08-16 10:42:59 +08:00
mingji
54622814cc
[CELEBORN-885][SPARK] Shade RoaringBitmap to avoid dependency conflicts
### What changes were proposed in this pull request?
Shade roaring bitmap to void dependency conflicts.

### Why are the changes needed?
Some user reports that celeborn client will introduce roaring bitmap conflicts.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #1803 from FMX/CELEBORN-885.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-08-11 12:48:12 +08:00
Fu Chen
0d1261632d [CELEBORN-880] Remove sbt compiler plugin genjavadoc-plugin
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

the plugin may generate unexpected source files in the project root directory. we need to refine this feature if we want to generate Java doc.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1799 from cfmcgrady/sbt-compiler-plugin.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-08-09 10:16:33 +08:00
Fu Chen
6ba4b7e138
[CELEBORN-850][INFRA] Add SBT CI
### What changes were proposed in this pull request?

This PR adds new GitHub Actions workflows to enable Continuous Integration using SBT based on #1764

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #1771 from cfmcgrady/sbt-ci.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-01 18:14:58 +08:00
Fu Chen
40e416c95c
[CELEBORN-843][BUILD] sbt support flink-related module build/test
### What changes were proposed in this pull request?

This PR adds packaging and testing support for Flink-related modules using SBT based on #1757

### Why are the changes needed?

improve project build speed

running flink-it tests with -Pflink-1.14

```shell
sbt:celeborn> project flink-it
sbt:flink-it> clean
sbt:flink-it> test
[success] Total time: 136 s (02:16), completed 2023-7-27 11:55:10
```

running flink-it tests with -Pflink-1.17

```shell
$ ./build/sbt -Pflink-1.17
sbt:celeborn> project flink-it
sbt:flink-it> clean
sbt:flink-it> test
[success] Total time: 168 s (02:48), completed 2023-7-27 11:28:35
```

packing and shading the flink 1.14 client

```shell
$ ./build/sbt -Pflink-1.14
sbt:celeborn> clean
sbt:celeborn> project celeborn-client-flink-1_14-shaded
sbt:celeborn-client-flink-1_14-shaded> assembly
[success] Total time: 35 s, completed 2023-7-27 11:51:54
```

packing and shading the flink 1.17 client

```shell
$ ./build/sbt -Pflink-1.17
sbt:celeborn> clean
sbt:celeborn> project celeborn-client-flink-1_17-shaded
sbt:celeborn-client-flink-1_17-shaded> assembly
[success] Total time: 39 s, completed 2023-7-27 11:49:20
```

### Does this PR introduce _any_ user-facing change?

yes

### How was this patch tested?

tested locally

Closes #1764 from cfmcgrady/sbt-flink.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-08-01 12:29:29 +08:00
Fu Chen
5f0295e9f3
[CELEBORN-836][BUILD] Initial support sbt
### What changes were proposed in this pull request?

This PR introduces the SBT build system implementation that operates independently from the current Maven build system. Different from https://github.com/apache/incubator-celeborn/pull/1627, the current implementation does not depend on `pom.xml`

The implementation enables packaging and testing functionalities for server-related modules and Spark-related modules using SBT.

For Flink-related build/test, sbt build documentation, continuous integration, and plugins, they will be submitted in separate PRs

### Why are the changes needed?

improve project build speed

packing the project.

```shell
$ ./build/sbt
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:36:12
sbt:celeborn> package
[success] Total time: 28 s, completed 2023-7-25 16:36:46
```

packing and shading the spark 3.3 client

```shell
$ ./build/sbt -Pspark-3.3
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:39:11
sbt:celeborn> project celeborn-client-spark-3-shaded
sbt:celeborn-client-spark-3-shaded> assembly
[success] Total time: 37 s, completed 2023-7-25 16:40:03
```

packing and shading the spark 2.4 client

```shell
$ ./build/sbt -Pspark-2.4
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:41:06
sbt:celeborn> project celeborn-client-spark-2-shaded
sbt:celeborn-client-spark-2-shaded> assembly
[success] Total time: 36 s, completed 2023-7-25 16:41:53
```

running server-related tests

```shell
$ ./build/sbt clean test
[success] Total time: 350 s (05:50), completed 2023-7-25 16:48:58
```

### Does this PR introduce _any_ user-facing change?

yes

### How was this patch tested?

tested locally

Closes #1757 from cfmcgrady/pure-sbt.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-28 10:40:04 +08:00