Commit Graph

12 Commits

Author SHA1 Message Date
SteNicholas
7fa1d32a98 [CELEBORN-1374] Refactor SortBuffer and PartitionSortedBuffer
### What changes were proposed in this pull request?

Refactor `SortBuffer` and `PartitionSortedBuffer` with introduction of `DataBuffer` and `SortBasedDataBuffer`.

### Why are the changes needed?

`SortBuffer` and `PartitionSortedBuffer` is refactored in https://github.com/apache/flink/pull/18505. Celeborn Flink should also refactor `SortBuffer` and `PartitionSortedBuffer` to sync the interface changes in Flink. Meanwhile, `SortBuffer` and `PartitionSortedBuffer` should distinguish channel and subpartition for https://github.com/apache/flink/pull/23927.

### Does this PR introduce _any_ user-facing change?

- `SortBuffer` renames to `DataBuffer`.
- `PartitionSortedBuffer` renames to `SortBasedDataBuffer`.
- `SortBuffer.BufferWithChannel` renames to `BufferWithSubpartition`

### How was this patch tested?

UT and IT.

Closes #2448 from SteNicholas/CELEBORN-1374.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-04-09 15:47:57 +08:00
Mridul Muralidharan
b1f8ec8357 [CELEBORN-1351] Introduce SSLFactory and enable TLS support
### What changes were proposed in this pull request?

Add SSLFactory, and wire up TLS support with rest of Celeborn to enable secure over the wire communication.

### Why are the changes needed?
Add support for TLS to secure wire communication.
This is the last PR to add basic support for TLS.
There will be a follow up for CELEBORN-1356 and documentation ofcourse !

### Does this PR introduce _any_ user-facing change?
Yes, completes basic support for TLS in Celeborn.

### How was this patch tested?
Existing tests, augmented with additional unit tests.

Closes #2438 from mridulm/add-sslfactory-and-related-changes.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-04-08 10:42:29 +08:00
Mridul Muralidharan
3ff8812cdd [CELEBORN-1348] Update infrastructure for SSL communication
### What changes were proposed in this pull request?

Update infrastructure for SSL support.
Please see #2416 for the consolidated PR with all the changes for reference.

### Why are the changes needed?

At a high level, the changes are:
* `ManagedBuffer.convertToNettyForSsl`, to support SSL encryption.
* Add `EncryptedMessageWithHeader`, which is used to wrap the message and body, for use with SSL.
* `SslMessageEncoder`  is an encoder for SSL

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

The overall PR #2416 (and this PR as well) passes all tests, and this PR includes relevant subset of tests.

Closes #2427 from mridulm/update-infra-for-ssl.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-04-01 19:59:44 +08:00
SteNicholas
6fdeced158 [CELEBORN-1359] Support Netty Logging at the network layer
### What changes were proposed in this pull request?

Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. `NettyLogger` is introduced as a new class which is able to construct a log handler depending on the log level:

- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">`: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient.
- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">`: Netty's own log handler is used which dumps the message contents.
- Otherwise (when the logger is not `TRACE` or `DEBUG`) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect).

Backport:

- [[SPARK-36719][CORE] Supporting Netty Logging at the network layer](https://github.com/apache/spark/pull/33962)
- [[SPARK-45377][CORE] Handle InputStream in NettyLogger](https://github.com/apache/spark/pull/43165)

### Why are the changes needed?

This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local manually test.

Closes #2423 from SteNicholas/CELEBORN-1359.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-28 16:11:37 +08:00
Mridul Muralidharan
b14254be9a
[CELEBORN-1349] Add SSL related configs and support for ReloadingX509TrustManager
Add SSL related configs and support for `ReloadingX509TrustManager`, required for enabling SSL support.
Please see #2416 for the consolidated PR with all the changes for reference.

Introduces SSL related configs for enabling and configuring use of TLS.

Yes, introduces configs to control behavior of SSL

The overall PR #2411 (and this PR as well) passes all tests, this is specifically pulling out the `ReloadingX509TrustManager` and config related changes

Closes #2419 from mridulm/config-for-ssl.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-27 18:21:14 +08:00
sychen
b3eed34b57
[CELEBORN-1293] Output received signals at master and worker
### What changes were proposed in this pull request?
When we shut down the master or worker, we can output the signal as a record.

### Why are the changes needed?
Conveniently track the status of master and workers.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

```bash
./sbin/stop-all.sh
```

```
12:20:59.932 [SIGTERM handler] ERROR org.apache.celeborn.service.deploy.master.Master - RECEIVED SIGNAL TERM
```

```
12:20:59.563 [SIGTERM handler] ERROR org.apache.celeborn.service.deploy.worker.Worker - RECEIVED SIGNAL TERM
```

Closes #2334 from cxzl25/CELEBORN-1293.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-03-08 15:48:57 +08:00
mingji
fd944b2509
[CELEBORN-1250][FOLLOWUP] Fix license issues
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

Fix license issues for the main branch

cherry-pick https://github.com/apache/incubator-celeborn/pull/2259 and https://github.com/apache/incubator-celeborn/pull/2268 into the main branch.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2271 from cfmcgrady/license-main.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-30 16:45:21 +08:00
Cheng Pan
bb86074163
[CELEBORN-1202][FOLLOWUP] Update LICENSE and NOTICE files
### What changes were proposed in this pull request?

Update LICENSE and NOTICE files according to the mailing list comments.

### Why are the changes needed?

https://lists.apache.org/thread/zw5cw621dqgbktdolx7qynho0zt451pk

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Review.

Closes #2213 from pan3793/CELEBORN-1202-followup.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-01-10 19:26:54 +08:00
mingji
b4b86848e3
[CELEBORN-1202] LICENSE mentions third-party components under other open source licenses
### What changes were proposed in this pull request?

`LICENSE` mentions third-party components under other open source licenses like Apache Spark etc.

### Why are the changes needed?

`LICENSE` mentions 1 3rd party file from Guava. However, the `NOTICE` lists both Apache Spark and Apache Flink. `LICENSE` should mention all third-party components under other open source licenses.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2193 from SteNicholas/CELEBORN-1202.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-29 11:35:50 +08:00
Cheng Pan
78553f1418
[CELEBORN-1003] Correct the LICENSE and NOTICE for shaded client jars
### What changes were proposed in this pull request?

Correct the `LICENSE` and `NOTICE` for the following shaded client jars

- `celeborn-client-flink-1.14-shaded_2.12-<version>.jar`
- `celeborn-client-flink-1.15-shaded_2.12-<version>.jar`
- `celeborn-client-flink-1.17-shaded_2.12-<version>.jar`
- `celeborn-client-mr-shaded_2.12-<version>.jar`
- `celeborn-client-spark-2-shaded_2.11-<version>.jar`
- `celeborn-client-spark-3-shaded_2.12-<version>.jar`

### Why are the changes needed?

The `LICENSE` and `NOTICE` shipped in a jar should match the content of the jar, for shaded jars, it should acknowledge all the third-party classes that are bundled.

See more discussion at https://lists.apache.org/thread/8v4wy5o132rpsjync6465zztgjlf6h5p

For how to determine which third-party jars are bundled, take `celeborn-client-spark-3-shaded_2.12-<version>.jar` as an example, the following command performs the packaging, and we can find them out by looking at logs like `Including ... in the shaded jar`

```
build/mvn clean package -DskipTests -pl :celeborn-client-spark-3-shaded_2.12 -am -Pspark-3.3
```

```
[INFO] --- maven-shade-plugin:3.4.0:shade (default)  celeborn-client-spark-3-shaded_2.12 ---
[INFO] Including org.apache.celeborn:celeborn-client-spark-3_2.12🫙0.4.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.celeborn:celeborn-common_2.12🫙0.4.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.12.0 in the shaded jar.
[INFO] Including io.netty:netty-all:jar:4.1.93.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.93.Final in the shaded jar.
...
[INFO] Excluding org.apache.ratis:ratis-common:jar:2.5.1 from the shaded jar.
[INFO] Excluding org.apache.ratis:ratis-thirdparty-misc:jar:1.0.4 from the shaded jar.
[INFO] Excluding org.apache.ratis:ratis-proto:jar:2.5.1 from the shaded jar.
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes #1933 from pan3793/CELEBORN-1003.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-28 19:23:54 +08:00
Ethan Feng
a43e3141bc
[CELEBORN-224][FOLLOWUP] Correct license and notices. (#1189) 2023-02-02 10:52:11 +08:00
Alibaba OSS
0d29f88ada
Initial commit 2021-12-10 16:57:16 +08:00