Commit Graph

6 Commits

Author SHA1 Message Date
onebox-li
927d62425b
[CELEBORN-1125][FOLLOWUP] Add failureaccess shade
### What changes were proposed in this pull request?
Add failureaccess shade.

### Why are the changes needed?
When test main branch, client got error like below:
```
Caused by: java.lang.NoClassDefFoundError: org/apache/celeborn/shaded/com/google/common/util/concurrent/internal/InternalFutureFailureAccess
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3517)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LoadingValueReference.<init>(LocalCache.java:3521)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2170)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2081)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:4019)
	at org.apache.celeborn.shaded.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4933)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.replyGetReducerFileGroup(ReducePartitionCommitHandler.scala:283)
	at org.apache.celeborn.client.commit.ReducePartitionCommitHandler.handleGetReducerFileGroup(ReducePartitionCommitHandler.scala:300)
	at org.apache.celeborn.client.CommitManager.handleGetReducerFileGroup(CommitManager.scala:266)
	at org.apache.celeborn.client.LifecycleManager.org$apache$celeborn$client$LifecycleManager$$handleGetReducerFileGroup(LifecycleManager.scala:628)
	at org.apache.celeborn.client.LifecycleManager$$anonfun$receiveAndReply$1.applyOrElse(LifecycleManager.scala:314)
	at org.apache.celeborn.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
	at org.apache.celeborn.common.rpc.netty.Inbox.safelyCall(Inbox.scala:222)
	at org.apache.celeborn.common.rpc.netty.Inbox.process(Inbox.scala:110)
	at org.apache.celeborn.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #2116 from onebox-li/shade-add-failureaccess.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-27 18:01:38 +08:00
sychen
4465a9229b [CELEBORN-1048][FOLLOWUP] MR module compile
### What changes were proposed in this pull request?
Let the MR module compile successfully.

### Why are the changes needed?
#2000 added parameters in the `ShuffleClient#readPartition` method, resulting in MR module compilation failure.

MR CI is still missing.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
local test
```bash
./build/make-distribution.sh -Pmr
```

Closes #2069 from cxzl25/CELEBORN-1048-FOLLOWUP.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-04 20:21:47 +08:00
sychen
22f523537e [CELEBORN-1002] Add SBT MRClientProject
### What changes were proposed in this pull request?

### Why are the changes needed?

```bash
./build/make-distribution.sh --sbt-enabled -Pmr
```

```bash
./build/make-distribution.sh --sbt-enabled --release
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1930 from cxzl25/CELEBORN-1002.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-10-08 10:03:21 +08:00
Cheng Pan
78553f1418
[CELEBORN-1003] Correct the LICENSE and NOTICE for shaded client jars
### What changes were proposed in this pull request?

Correct the `LICENSE` and `NOTICE` for the following shaded client jars

- `celeborn-client-flink-1.14-shaded_2.12-<version>.jar`
- `celeborn-client-flink-1.15-shaded_2.12-<version>.jar`
- `celeborn-client-flink-1.17-shaded_2.12-<version>.jar`
- `celeborn-client-mr-shaded_2.12-<version>.jar`
- `celeborn-client-spark-2-shaded_2.11-<version>.jar`
- `celeborn-client-spark-3-shaded_2.12-<version>.jar`

### Why are the changes needed?

The `LICENSE` and `NOTICE` shipped in a jar should match the content of the jar, for shaded jars, it should acknowledge all the third-party classes that are bundled.

See more discussion at https://lists.apache.org/thread/8v4wy5o132rpsjync6465zztgjlf6h5p

For how to determine which third-party jars are bundled, take `celeborn-client-spark-3-shaded_2.12-<version>.jar` as an example, the following command performs the packaging, and we can find them out by looking at logs like `Including ... in the shaded jar`

```
build/mvn clean package -DskipTests -pl :celeborn-client-spark-3-shaded_2.12 -am -Pspark-3.3
```

```
[INFO] --- maven-shade-plugin:3.4.0:shade (default)  celeborn-client-spark-3-shaded_2.12 ---
[INFO] Including org.apache.celeborn:celeborn-client-spark-3_2.12🫙0.4.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.celeborn:celeborn-common_2.12🫙0.4.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.12.0 in the shaded jar.
[INFO] Including io.netty:netty-all:jar:4.1.93.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.93.Final in the shaded jar.
...
[INFO] Excluding org.apache.ratis:ratis-common:jar:2.5.1 from the shaded jar.
[INFO] Excluding org.apache.ratis:ratis-thirdparty-misc:jar:1.0.4 from the shaded jar.
[INFO] Excluding org.apache.ratis:ratis-proto:jar:2.5.1 from the shaded jar.
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes #1933 from pan3793/CELEBORN-1003.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-28 19:23:54 +08:00
Mridul Muralidharan
3a41db360b
[CELEBORN-1006] Add support for Apache Hadoop 2.x in Celeborn build
Add support for Apache Hadoop 2.x in Celeborn build
Developers need to only specify their `hadoop.version`, and the build will pick the right profile internally based on the version to add the relevant dependencies.

[hadoop-client-api](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api) and [hadoop-client-runtime](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime) were introduced in hadoop 3.x, while hadoop 2.x had [hadoop-client](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client)
Celeborn depends on the former, and so requires hadoop 3.x to build.

Apache Spark dropped support for Hadoop 2.x only in the recent v3.5 ([SPARK-42452](https://issues.apache.org/jira/browse/SPARK-42452)). Given this, we have case where deployments on supported platforms like Spark 3.4 and older running on 2.x hadoop, will need to pull in hadoop 3.x just for Celeborn.

This PR uses `hadoop-client` when `hadoop.version` is specified as 2.x - and preserves existing behavior when `hadoop.version` is 3.x

Note - while using `hadoop-client` in 3.x is an option, hadoop community recommendation is to rely on `hadoop-client-api`/`hadoop-client-runtime`, hence making an effort to leverage that as much as possible.

Adds support for using 2.x for hadoop.version

Three combinations were tested:

* Default, without overriding hadoop.version

Dependencies:
```
$ build/mvn dependency:list 2>&1 | grep hadoop | sort | uniq
[INFO]    org.apache.hadoop:hadoop-client-api:jar:3.2.4:compile
[INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.2.4:compile
```

Will update this section again based on test suite results (which are ongoing)

* Setting hadoop.version to newer 3.3.0 explicitly

Dependencies:
```
$ ARGS="-Pspark-3.1 -Dhadoop.version=3.3.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | sort | uniq
[INFO]    org.apache.hadoop:hadoop-client-api:jar:3.3.0:compile
[INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.3.0:compile
```

* Setting hadoop.version to older 2.10.0

Dependencies:
```
$ ARGS="-Pspark-3.1 -Dhadoop.version=2.10.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | grep compile | sort | uniq
[INFO]    org.apache.hadoop:hadoop-auth:jar:2.10.0:compile -- module hadoop.auth (auto)
[INFO]    org.apache.hadoop:hadoop-client:jar:2.10.0:compile -- module hadoop.client (auto)
[INFO]    org.apache.hadoop:hadoop-common:jar:2.10.0:compile -- module hadoop.common (auto)
[INFO]    org.apache.hadoop:hadoop-hdfs-client:jar:2.10.0:compile -- module hadoop.hdfs.client (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.10.0:compile -- module hadoop.mapreduce.client.app (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.10.0:compile -- module hadoop.mapreduce.client.common (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.10.0:compile -- module hadoop.mapreduce.client.core (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.10.0:compile
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.10.0:compile -- module hadoop.mapreduce.client.shuffle (auto)
[INFO]    org.apache.hadoop:hadoop-yarn-api:jar:2.10.0:compile -- module hadoop.yarn.api (auto)
[INFO]    org.apache.hadoop:hadoop-yarn-common:jar:2.10.0:compile -- module hadoop.yarn.common (auto)
```

For each of the case above, build/test passes for each of the `ARGS`.

Closes #1936 from mridulm/main.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-25 20:15:02 +08:00
mingji
e0c00ecd38 [CELEBORN-839][MR] Support Hadoop MapReduce
### What changes were proposed in this pull request?
1. Map side merge and push.
2. Support hadoop2 & 3.
3. Reduce in-memory merge.
4. Integrate LifecycleManager to RmApplicationMaster.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster.

I tested this PR on a cluster with a 4x 16 CPU 64G Mem 4ESSD cluster.
Hadoop 2.8.5

1TB Terasort, 8400 mappers, 1000 reducers
Celeborn 81min vs MR shuffle 89min
![mr1](https://github.com/apache/incubator-celeborn/assets/4150993/a3cf6493-b6ff-4c03-9936-4558cf22761d)
![mr2](https://github.com/apache/incubator-celeborn/assets/4150993/9119ffb4-6996-4b77-bcdf-cbd6db5c096f)

1GB wordcount, 8 mappers, 8 reducers
Celeborn 35s VS MR shuffle 38s
![mr3](https://github.com/apache/incubator-celeborn/assets/4150993/907dce24-16b7-4788-ab5d-5b784fd07d47)
![mr4](https://github.com/apache/incubator-celeborn/assets/4150993/8e8065b9-6c46-4c8d-9e71-45eed8e63877)

Closes #1830 from FMX/CELEBORN-839.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-14 14:12:53 +08:00