Commit Graph

9 Commits

Author SHA1 Message Date
Mridul Muralidharan
3a41db360b
[CELEBORN-1006] Add support for Apache Hadoop 2.x in Celeborn build
Add support for Apache Hadoop 2.x in Celeborn build
Developers need to only specify their `hadoop.version`, and the build will pick the right profile internally based on the version to add the relevant dependencies.

[hadoop-client-api](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api) and [hadoop-client-runtime](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-runtime) were introduced in hadoop 3.x, while hadoop 2.x had [hadoop-client](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client)
Celeborn depends on the former, and so requires hadoop 3.x to build.

Apache Spark dropped support for Hadoop 2.x only in the recent v3.5 ([SPARK-42452](https://issues.apache.org/jira/browse/SPARK-42452)). Given this, we have case where deployments on supported platforms like Spark 3.4 and older running on 2.x hadoop, will need to pull in hadoop 3.x just for Celeborn.

This PR uses `hadoop-client` when `hadoop.version` is specified as 2.x - and preserves existing behavior when `hadoop.version` is 3.x

Note - while using `hadoop-client` in 3.x is an option, hadoop community recommendation is to rely on `hadoop-client-api`/`hadoop-client-runtime`, hence making an effort to leverage that as much as possible.

Adds support for using 2.x for hadoop.version

Three combinations were tested:

* Default, without overriding hadoop.version

Dependencies:
```
$ build/mvn dependency:list 2>&1 | grep hadoop | sort | uniq
[INFO]    org.apache.hadoop:hadoop-client-api:jar:3.2.4:compile
[INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.2.4:compile
```

Will update this section again based on test suite results (which are ongoing)

* Setting hadoop.version to newer 3.3.0 explicitly

Dependencies:
```
$ ARGS="-Pspark-3.1 -Dhadoop.version=3.3.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | sort | uniq
[INFO]    org.apache.hadoop:hadoop-client-api:jar:3.3.0:compile
[INFO]    org.apache.hadoop:hadoop-client-runtime:jar:3.3.0:compile
```

* Setting hadoop.version to older 2.10.0

Dependencies:
```
$ ARGS="-Pspark-3.1 -Dhadoop.version=2.10.0" ; build/mvn dependency:list $ARGS 2>&1 | grep hadoop | grep compile | sort | uniq
[INFO]    org.apache.hadoop:hadoop-auth:jar:2.10.0:compile -- module hadoop.auth (auto)
[INFO]    org.apache.hadoop:hadoop-client:jar:2.10.0:compile -- module hadoop.client (auto)
[INFO]    org.apache.hadoop:hadoop-common:jar:2.10.0:compile -- module hadoop.common (auto)
[INFO]    org.apache.hadoop:hadoop-hdfs-client:jar:2.10.0:compile -- module hadoop.hdfs.client (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.10.0:compile -- module hadoop.mapreduce.client.app (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.10.0:compile -- module hadoop.mapreduce.client.common (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.10.0:compile -- module hadoop.mapreduce.client.core (auto)
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.10.0:compile
[INFO]    org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.10.0:compile -- module hadoop.mapreduce.client.shuffle (auto)
[INFO]    org.apache.hadoop:hadoop-yarn-api:jar:2.10.0:compile -- module hadoop.yarn.api (auto)
[INFO]    org.apache.hadoop:hadoop-yarn-common:jar:2.10.0:compile -- module hadoop.yarn.common (auto)
```

For each of the case above, build/test passes for each of the `ARGS`.

Closes #1936 from mridulm/main.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-25 20:15:02 +08:00
mingji
e0c00ecd38 [CELEBORN-839][MR] Support Hadoop MapReduce
### What changes were proposed in this pull request?
1. Map side merge and push.
2. Support hadoop2 & 3.
3. Reduce in-memory merge.
4. Integrate LifecycleManager to RmApplicationMaster.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster.

I tested this PR on a cluster with a 4x 16 CPU 64G Mem 4ESSD cluster.
Hadoop 2.8.5

1TB Terasort, 8400 mappers, 1000 reducers
Celeborn 81min vs MR shuffle 89min
![mr1](https://github.com/apache/incubator-celeborn/assets/4150993/a3cf6493-b6ff-4c03-9936-4558cf22761d)
![mr2](https://github.com/apache/incubator-celeborn/assets/4150993/9119ffb4-6996-4b77-bcdf-cbd6db5c096f)

1GB wordcount, 8 mappers, 8 reducers
Celeborn 35s VS MR shuffle 38s
![mr3](https://github.com/apache/incubator-celeborn/assets/4150993/907dce24-16b7-4788-ab5d-5b784fd07d47)
![mr4](https://github.com/apache/incubator-celeborn/assets/4150993/8e8065b9-6c46-4c8d-9e71-45eed8e63877)

Closes #1830 from FMX/CELEBORN-839.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-14 14:12:53 +08:00
Fu Chen
3fb896b11f [CELEBORN-666] Define protobuf-maven-plugin in the root pom.xml
### What changes were proposed in this pull request?

Define `protobuf-maven-plugin` in the root pom.xml

### Why are the changes needed?

to fix

```bash
build/mvn protobuf:compile -am -pl common
```

```
[ERROR] No plugin found for prefix 'protobuf' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/Users/fchen/.m2/repository), apache.snapshots (https://repository.apache.org/snapshots), central (https://repo.maven.apache.org/maven2)] -> [Help 1]
org.apache.maven.plugin.prefix.NoPluginFoundForPrefixException: No plugin found for prefix 'protobuf' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/Users/fchen/.m2/repository), apache.snapshots (https://repository.apache.org/snapshots), central (https://repo.maven.apache.org/maven2)]
    at org.apache.maven.plugin.prefix.internal.DefaultPluginPrefixResolver.resolve (DefaultPluginPrefixResolver.java:95)
    at org.apache.maven.lifecycle.internal.MojoDescriptorCreator.findPluginForPrefix (MojoDescriptorCreator.java:266)
    at org.apache.maven.lifecycle.internal.MojoDescriptorCreator.getMojoDescriptor (MojoDescriptorCreator.java:220)
    at org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments (DefaultLifecycleTaskSegmentCalculator.java:104)
    at org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments (DefaultLifecycleTaskSegmentCalculator.java:83)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:89)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:298)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoPluginFoundForPrefixException
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

tested locally.

Closes #1579 from cfmcgrady/protobuf-plugin.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-12 19:46:46 +08:00
Cheng Pan
76533d7324
[CELEBORN-650][TEST] Upgrade scalatest and unify mockito version
### What changes were proposed in this pull request?

This PR upgrades

- `mockito` from 1.10.19 and 3.6.0 to 4.11.0
- `scalatest` from 3.2.3 to 3.2.16
- `mockito-scalatest` from 1.16.37 to 1.17.14

### Why are the changes needed?

Housekeeping, making test dependencies up-to-date and unified.

### Does this PR introduce _any_ user-facing change?

No, it only affects test.

### How was this patch tested?

Pass GA.

Closes #1562 from pan3793/CELEBORN-650.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-09 10:04:14 +08:00
Angerszhuuuu
811e192bbd
[CELEBORN-446] Support rack aware during assign slots for ROUNDROBIN (#1370) 2023-05-18 13:58:51 +08:00
Angerszhuuuu
4c90e0b02a
[CELEBORN-359] Add ratis-shell to celeborn (#1292) 2023-03-01 17:04:57 +08:00
Fu Chen
ab449ffdd7
[CELEBORN-198] Fix the wrong configuration path of plugin protobuf-maven-plugin and … (#1146) 2023-01-05 20:09:31 +08:00
Cheng Pan
96e969f46e
[BUILD] Extract project.version to Maven Property (#772) 2022-10-16 19:01:40 +08:00
Cheng Pan
ab16b4f101
[INFRA] Rename modules w/ celeborn prefix (#723) 2022-10-08 08:05:57 +08:00