celeborn/docs/developers/sbt.md
SteNicholas 75446a05d3 [CELEBORN-2093] Support Flink 2.1
### What changes were proposed in this pull request?

Support Flink 2.1.

### Why are the changes needed?

Flink 2.1 has already released, which release notes refer to [Release notes - Flink 2.1](https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-2.1).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #3404 from SteNicholas/CELEBORN-2093.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2025-08-04 14:12:55 +08:00

17 KiB
Raw Permalink Blame History

license
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Building via SBT

Starting from version 0.4.0, the Celeborn project supports building and packaging using SBT. This article provides a detailed guide on how to build the Celeborn project using SBT.

System Requirements

Celeborn Service (master/worker) supports Scala 2.11/2.12/2.13 and Java 8/11/17.

The following table indicates the compatibility of Celeborn Spark and Flink clients with different versions of Spark and Flink for various Java and Scala versions:

Java 8/Scala 2.11 Java 8/Scala 2.12 Java 11/Scala 2.12 Java 17/Scala 2.12 Java 8/Scala 2.13 Java 11/Scala 2.13 Java 17/Scala 2.13
Spark 2.4
Spark 3.0
Spark 3.1
Spark 3.2
Spark 3.3
Spark 3.4
Spark 3.5
Spark 4.0
Flink 1.16
Flink 1.17
Flink 1.18
Flink 1.19
Flink 1.20
Flink 2.0
Flink 2.1

Useful SBT commands

Packaging the Project

As an example, one can build a version of Celeborn as follows:

./build/sbt clean package

To create a Celeborn distribution like those distributed by the Celeborn Downloads page, and that is laid out to be runnable, use ./build/make-distribution.sh in the project root directory.

./build/make-distribution.sh --sbt-enabled --release

Maven-Style Profile Management

We have adopted the Maven-style profile management for our Client module. For example, you can enable the Spark 3.3 client module by adding -Pspark-3.3:

# ./build/sbt -Pspark-3.3 projects

[info] set current project to celeborn (in build file:/root/celeborn/)
[info] In file:/root/celeborn/
[info]   * celeborn
[info]     celeborn-client
[info]     celeborn-client-spark-3
[info]     celeborn-client-spark-3-shaded
[info]     celeborn-common
[info]     celeborn-master
[info]     celeborn-service
[info]     celeborn-spark-common
[info]     celeborn-spark-group
[info]     celeborn-spark-it
[info]     celeborn-worker

To enable the Flink 1.16 client module, add -Pflink-1.16:

# ./build/sbt -Pflink-1.16 projects

[info] set current project to celeborn (in build file:/root/celeborn/)
[info] In file:/root/celeborn/
[info]   * celeborn
[info]     celeborn-client
[info]     celeborn-client-flink-1_16
[info]     celeborn-client-flink-1_16-shaded
[info]     celeborn-common
[info]     celeborn-flink-common
[info]     celeborn-flink-group
[info]     celeborn-flink-it
[info]     celeborn-master
[info]     celeborn-service
[info]     celeborn-worker

By using these profiles, you can easily switch between different client modules for Spark and Flink. These profiles enable specific dependencies and configurations relevant to the chosen version. This way, you can conveniently manage and build the desired configurations of the Celeborn project.

For example, you can build the Spark 3.3 client assembly jar by running the following commands:

$ ./build/sbt -Pspark-3.3
> project celeborn-client-spark-3-shaded
> assembly

$ # Or, you can use sbt directly with the `-Pspark-3.3` profile:
$ ./build/sbt -Pspark-3.3 celeborn-client-spark-3-shaded/assembly

Similarly, you can build the Flink 1.16 client assembly jar using the following commands:

$ ./build/sbt -Pflink-1.16
> project celeborn-client-flink-1_16-shaded
> assembly

$ # Or, you can use sbt directly with the `-Pflink-1.16` profile:
$ ./build/sbt -Pflink-1.16 celeborn-client-flink-1_16-shaded/assembly

By executing these commands, you will create assembly jar files for the respective Spark and Flink client modules. The assembly jar bundles all the dependencies, allowing the client module to be used independently with all required dependencies included.

Building submodules individually

For instance, you can build the Celeborn Master module using:

$ # sbt
$ ./build/sbt
> project celeborn-master
> package

$ # Or, you can build the celeborn-master module with sbt directly using:
$ ./build/sbt celeborn-master/package

Testing with SBT

To run all tests for the Celeborn project, you can use the following command:

./build/sbt test

Running tests for specific versions of Spark/Flink client.

For example, to run the test cases for the Spark 3.3 client, use the following command:

$ ./build/sbt -Pspark-3.3 test

$ # only run spark client related modules tests
$ ./build/sbt -Pspark-3.3 celeborn-spark-group/test

Similarly, to run the test cases for the Flink 1.16 client, use the following command:

$ ./build/sbt -Pflink-1.16 test

$ # only run flink client related modules tests
$ ./build/sbt -Pflink-1.16 celeborn-flink-group/test

Running Individual Tests

When developing locally, its often convenient to run a single test or a few tests, rather than running the entire test suite.

The fastest way to run individual tests is to use the sbt console. Its fastest to keep a sbt console open, and use it to re-run tests as necessary. For example, to run all the tests in a particular project, e.g., master:

$ ./build/sbt
> project celeborn-master
> test

You can run a single test suite using the testOnly command. For example, to run the SlotsAllocatorSuiteJ:

> testOnly org.apache.celeborn.service.deploy.master.SlotsAllocatorSuiteJ

The testOnly command accepts wildcards; e.g., you can also run the SlotsAllocatorSuiteJ with:

> testOnly *SlotsAllocatorSuiteJ

Or you could run all the tests in the master package:

> testOnly org.apache.celeborn.service.deploy.master.*

If youd like to run just a single Java test in the SlotsAllocatorSuiteJ, e.g., a test that with the name testAllocateSlotsForSinglePartitionId, you run the following command in the sbt console:

> testOnly *SlotsAllocatorSuiteJ -- *SlotsAllocatorSuiteJ.testAllocateSlotsForSinglePartitionId

If youd like to run just a single Scala test in the MasterSuite, e.g., a test that includes "test single node startup functionality" in the name, you run the following command in the sbt console:

> testOnly *MasterSuite -- -z "test single node startup functionality"

If youd prefer, you can run all of these commands on the command line (but this will be slower than running tests using an open console). To do this, you need to surround testOnly and the following arguments in quotes:

$ ./build/sbt "celeborn-master/testOnly *MasterSuite -- -z \"test single node startup functionality\""

For more about how to run individual tests with sbt, see the sbt documentation and JUnit Interface.

Accelerating SBT

This section provides instructions on setting up repository mirrors or proxies for a smoother SBT experience. Depending on your location and network conditions, you can choose the appropriate approach to accelerate SBT startup and enhance dependency retrieval.

Accelerating SBT Startup

The SBT startup process involves fetching the SBT bootstrap jar, which is typically obtained from the Maven Central Repository (https://repo1.maven.org/maven2/). If you encounter slow access to this repository or if it's inaccessible in your network environment, you can expedite the SBT startup by configuring a custom artifact repository using the DEFAULT_ARTIFACT_REPOSITORY environment variable.

$ # The following command fetches sbt-launch-x.y.z.jar from https://maven.aliyun.com/nexus/content/groups/public/
$ # Ensure that the URL ends with a trailing slash "/"
$ export DEFAULT_ARTIFACT_REPOSITORY=https://maven.aliyun.com/nexus/content/groups/public/
$ ./build/sbt

This will initiate SBT using the specified repository, allowing for faster download and startup times.

Custom SBT Repositories

The current repositories embedded within the Celeborn project are detailed below:

[repositories]
  local
  mavenLocal: file://${user.home}/.m2/repository/
  local-preloaded-ivy: file:///${sbt.preloaded-${sbt.global.base-${user.home}/.sbt}/preloaded/}, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext]
  local-preloaded: file:///${sbt.preloaded-${sbt.global.base-${user.home}/.sbt}/preloaded/}
  # The system property value of `celeborn.sbt.default.artifact.repository` is
  # fetched from the environment variable `DEFAULT_ARTIFACT_REPOSITORY` and
  # assigned within the build/sbt-launch-lib.bash script.
  private: ${celeborn.sbt.default.artifact.repository-file:///dev/null}
  gcs-maven-central-mirror: https://maven-central.storage-download.googleapis.com/repos/central/data/
  maven-central
  typesafe-ivy-releases: https://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  sbt-ivy-snapshots: https://repo.scala-sbt.org/scalasbt/ivy-snapshots/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  sbt-plugin-releases: https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
  bintray-typesafe-sbt-plugin-releases: https://dl.bintray.com/typesafe/sbt-plugins/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
  bintray-spark-packages: https://dl.bintray.com/spark-packages/maven/
  typesafe-releases: https://repo.typesafe.com/typesafe/releases/

For numerous developers across various regions, the default repository download speeds are less than optimal. To address this concern, we have curated a selection of verified public mirror templates tailored for specific regions with a significant local developer presence. For instance, we provide the repositories-cn.template template for developers situated within the expanse of the Chinese mainland, and the repositories-asia.template template designed for developers across the Asian continent. In such cases, the following command can be employed to enhance dependency download speeds:

cp build/sbt-config/repositories-cn.template build/sbt-config/repositories-local

Furthermore, it is strongly encouraged that developers from various regions contribute templates tailored to their respective areas.

!!! note 1. build/sbt-config/repositories-local takes precedence over build/sbt-config/repositories and is ignored by .gitignore. 2. Should the environment variable DEFAULT_ARTIFACT_REPOSITORY be set, it attains the highest priority among non-local repositories. 3. Repository priority is determined by the file order; repositories listed earlier possess higher precedence.

Similarly, if your objective involves compiling and packaging within an intranet environment, you can edit build/sbt-config/repositories-local as demonstrated below:

[repositories]
  local
  mavenLocal: file://${user.home}/.m2/repository/
  private: ${celeborn.sbt.default.artifact.repository-file:///dev/null}
  private-central: https://example.com/repository/maven/
  private-central-http: http://example.com/repository/maven/, allowInsecureProtocol

allowInsecureProtocol is required if you want to use a repository which only supports HTTP protocol but not HTTPS, otherwise, an error will be raised (insecure HTTP request is unsupported), please refer to the sbt Launcher Configuration.

For more details on sbt repository configuration, please refer to the SBT documentation.

Publish

SBT supports publishing shade clients (Spark/Flink/MapReduce) to an internal Maven private repository, such as Sonatype Nexus or JFrog.

Before executing the publish command, ensure that the following environment variables are correctly set:

Environment Variable Description
ASF_USERNAME Sonatype repository username
ASF_PASSWORD Sonatype repository password
SONATYPE_SNAPSHOTS_URL Sonatype repository URL for snapshot version releases, default is "https://repository.apache.org/content/repositories/snapshots"
SONATYPE_RELEASES_URL Sonatype repository URL for official release versions, default is "https://repository.apache.org/service/local/staging/deploy/maven2"

For example:

export SONATYPE_SNAPSHOTS_URL=http://192.168.3.46:8081/repository/maven-snapshots/
export SONATYPE_RELEASES_URL=http://192.168.3.46:8081/repository/maven-releases/
export ASF_USERNAME=admin
export ASF_PASSWORD=123456

Publish the shade client for Spark 3.5:

$ ./build/sbt -Pspark-3.5 celeborn-client-spark-3-shaded/publish

Publish the shade client for Spark 4.0:

$ ./build/sbt -Pspark-4.0 celeborn-client-spark-4-shaded/publish

Publish the shade client for Flink 1.18:

$ ./build/sbt -Pflink-1.18 celeborn-client-flink-1_18-shaded/publish

Publish the shade client for MapReduce:

$ ./build/sbt -Pmr celeborn-client-mr-shaded/publish

Make sure to complete the necessary build and testing before executing the publish commands.