Commit Graph

17 Commits

Author SHA1 Message Date
codenohup
0fa600ade1 [CELEBORN-2055] Fix some typos
### What changes were proposed in this pull request?
Inspired by [FLINK-38038](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-38038?filter=allissues]), I used [Tongyi Lingma](https://lingma.aliyun.com/) and qwen3-thinking LLM to identify and fix some typo issues in the Celeborn codebase. For example:
- backLog → backlog
- won`t → won't
- can to be read → can be read
- mapDataPartition → mapPartitionData
- UserDefinePasswordAuthenticationProviderImpl → UserDefinedPasswordAuthenticationProviderImpl

### Why are the changes needed?
Remove typos to improve source code readability for users and ease development for developers.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Code and documentation cleanup does not require additional testing.

Closes #3356 from codenohup/fix-typo.

Authored-by: codenohup <huangxu.walker@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2025-07-10 12:01:02 +08:00
SteNicholas
dac0f56e94 [CELEBORN-1056][FOLLOWUP] Support testing of dynamic configuration management cli
### What changes were proposed in this pull request?

Support testing of dynamic configuration management cli.

### Why are the changes needed?

The tests of dynamic configuration management cli are disabled since dynamic conf is not enabled in unit tests, which should support testing dynamic configuration management cli.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`TestCelebornCliCommands`.

Closes #3340 from SteNicholas/CELEBORN-1056.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-06-22 21:25:09 -07:00
SteNicholas
46c998067e [CELEBORN-1056][FOLLOWUP] Support upsert and delete of dynamic configuration management
### What changes were proposed in this pull request?

Support upsert and delete of dynamic configuration management.

### Why are the changes needed?

There is only listing dynamic configuration interface for dynamic configuration management. It should support upserting and deleting dynamic configuration.

### Does this PR introduce _any_ user-facing change?

- Rest API:
  - `/api/v1/conf/dynamic/upsert` to upsert dynamic configurations.
  - `/api/v1/conf/dynamic/delete` to delete dynamic configurations.
- CLI:
  - `--upsert-dynamic-conf` to upsert dynamic configurations.
  - `--delete-dynamic-conf` to upsert dynamic configurations.

### How was this patch tested?

- `ConfigServiceSuiteJ`
- `ApiV1BaseResourceSuite`
- `TestCelebornCliCommands`

Closes #3323 from SteNicholas/CELEBORN-1056.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-06-17 14:54:50 -07:00
Wang, Fei
68f32303cd [CELEBORN-1572][FOLLOWUP] Support to show Celeborn CLI version for sub command
### What changes were proposed in this pull request?
Support to show Celeborn CLI version for sub command.

### Why are the changes needed?

celeborn-cli [master|worker] -V does not show anything.

```
(base) ➜  apache-celeborn-0.6.0-bin-ebay ./sbin/celeborn-cli -V
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Celeborn CLI - Celeborn 0.6.0
(base) ➜  apache-celeborn-0.6.0-bin-ebay ./sbin/celeborn-cli -V
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Celeborn CLI - Celeborn 0.6.0
(base) ➜  apache-celeborn-0.6.0-bin-ebay ./sbin/celeborn-cli master -V
(base) ➜  apache-celeborn-0.6.0-bin-ebay ./sbin/celeborn-cli worker -V
(base) ➜  apache-celeborn-0.6.0-bin-ebay ./sbin/celeborn-cli master -h
Usage: celeborn-cli master [-hV] [--apps=appId] [--auth-header=authHeader]
...
(base) ➜  apache-celeborn-0.6.0-bin-ebay ./sbin/celeborn-cli worker -h
Usage: celeborn-cli worker [-hV] [--apps=appId] [--auth-header=authHeader]
...
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

UT.

```
(base) ➜  celeborn git:(cli_version) ./dist/sbin/celeborn-cli -V
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Celeborn CLI - Celeborn 0.7.0-SNAPSHOT
(base) ➜  celeborn git:(cli_version) ./dist/sbin/celeborn-cli master -V
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Celeborn CLI - Celeborn 0.7.0-SNAPSHOT
(base) ➜  celeborn git:(cli_version) ./dist/sbin/celeborn-cli worker -V
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Celeborn CLI - Celeborn 0.7.0-SNAPSHOT
(base) ➜  celeborn git:(cli_version)

```

Closes #3321 from turboFei/cli_version.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2025-06-11 14:00:20 +08:00
Aravind Patnam
ebfa1d8cf4 [CELEBORN-2014] updateInterruptionNotice REST API
### What changes were proposed in this pull request?
This PR is part of [CIP17: Interruption Aware Slot Selection](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-17%3A+Interruption+Aware+Slot+Selection).
It introduces a REST api for external services to notify master about interruptions/schedules.

### Why are the changes needed?
To nofify master of upcoming interruption notices in the worker fleet. Master can then use these to proactively deprioritize workers that might be in scope for interruption sooner.

### Does this PR introduce _any_ user-facing change?
new rest api

### How was this patch tested?
added unit tests.

Closes #3285 from akpatnam25/CELEBORN-2014.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2025-06-06 14:07:49 +08:00
Wang, Fei
5a50686ef6 [CELEBORN-2020][FOLLOWUP] Fix CLI master commands authentication testing
### What changes were proposed in this pull request?

Fix the remaining master sub commands that does not transfer auth header.

### Why are the changes needed?

Before, this mistake was not detected by GA.

Because the authentication configs was not trasnferred when setting mini celeborn cluster.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

UT.

Closes #3303 from turboFei/auth_header_followup.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: 子懿 <ziyi.jxf@antgroup.com>
2025-06-03 20:32:43 +08:00
Wang, Fei
5f58fb1e3e [CELEBORN-2020] Support http authentication for Celeborn CLI
### What changes were proposed in this pull request?

Support http authentication for Celeborn CLI.

### Why are the changes needed?

Current CLI does not work if the authentication is enabled for master or worker.

### Does this PR introduce _any_ user-facing change?
Yes, a new option.

### How was this patch tested?

UT.

Closes #3300 from turboFei/cli_auth.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2025-05-30 00:28:41 -07:00
SteNicholas
cc501928ce [CELEBORN-1875][FOLLOWUP] Support master --show-workers-topology command to show registered workers topology
### What changes were proposed in this pull request?

Follow up #3112.

Support `master --show-workers-topology` command to show registered workers topology.

### Why are the changes needed?

Rest API `/api/v1/workers/topology` supports the grouped workers topology info, which is lack of cli command to show registered workers topology.

### Does this PR introduce _any_ user-facing change?

Introduce `master --show-workers-topology` command.

### How was this patch tested?

`TestCelebornCliCommands#master --show-workers-topology`

Closes #3119 from SteNicholas/CELEBORN-1875.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2025-02-27 10:44:51 +08:00
Wang, Fei
59163c2a23 [CELEBORN-1745] Remove application top disk usage code
### What changes were proposed in this pull request?
Remove the code for app top disk usage both in master and worker end.

Prefer to use below prometheus expr to figure out the top app usages.
```
topk(50, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId!=""}))
```

### Why are the changes needed?
To address comments: https://github.com/apache/celeborn/pull/2947#issuecomment-2499564978

> Due to the application dimension resource consumption, this feature should be included in the deprecated features. Maybe you can remove the codes for application top disk usage.

### Does this PR introduce _any_ user-facing change?

Yes, remove the app top disk usage api.

### How was this patch tested?
GA.

Closes #2949 from turboFei/remove_app_top_usage.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-28 10:55:34 +08:00
Wang, Fei
be4c02e6d0 [CELEBORN-1601][FOLLOWUP] Refine the RESTful apis for revise lost shuffles
### What changes were proposed in this pull request?

1. `GET /api/v1/applications/deleteApps`  -> `DELETE /api/v1/applications`

2. `GET /api/v1/applications/reviseLostShuffles`  -> `POST /api/v1/applications/revise_lost_shuffles`

### Why are the changes needed?
Followup for https://github.com/apache/celeborn/pull/2746

### Does this PR introduce _any_ user-facing change?

No, these APIs has not been released yet.

### How was this patch tested?
GA.

Closes #2892 from turboFei/delete_app.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-11-12 10:11:25 +08:00
Wang, Fei
7dc72b35e7 [CELEBORN-1477][FOLLOWUP] Remove scala binary version from openapi-client artifactId
### What changes were proposed in this pull request?

1. remove scala binary version from the openapi-client artifactId.
2. skip openapi-client doc compile, it was missed in https://github.com/apache/celeborn/pull/2641

### Why are the changes needed?

Because the openapi-client is a pure java module.

### Does this PR introduce _any_ user-facing change?

No, it has not been released.

### How was this patch tested?
GA.

Closes #2861 from turboFei/remove_Scala.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-01 15:08:00 +08:00
mingji
df01fadc9f
[CELEBORN-1601] Support revise lost shuffles
### What changes were proposed in this pull request?
To support revising lost shuffle IDs in a long-running job such as flink batch jobs.

### Why are the changes needed?
1. To support revise lost shuffles.
2. To add an HTTP endpoint to revise lost shuffles manually.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster tests.

Closes #2746 from FMX/b1600.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-10-21 16:44:37 +08:00
Aravind Patnam
fe01bac276 [CELEBORN-1599] Container Info REST API
### What changes were proposed in this pull request?
Adding REST api and cli for container info. User can configure this api to be based on whichever cluster manager they are using.

### Why are the changes needed?
see above

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added UTs

Closes #2758 from akpatnam25/CELEBORN-1599.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-17 17:44:08 +08:00
SteNicholas
5dea9f13d2
[CELEBORN-1593][FOLLOWUP] Support master --show-manual-excluded-workers command to show manual excluded workers
### What changes were proposed in this pull request?

Support `master --show-manual-excluded-workers` command to show manual excluded workers.

### Why are the changes needed?

`/api/v1/workers` returns the manual excluded workers, which could also be supported in master command.

### Does this PR introduce _any_ user-facing change?

Introduce `master --show-manual-excluded-workers` command.

### How was this patch tested?

`TestCelebornCliCommands#master --show-manual-excluded-workers`

Closes #2790 from SteNicholas/CELEBORN-1593.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-10 20:16:42 +08:00
Aravind Patnam
c4c32995d3 [CELEBORN-1593][FOLLOWUP] Add CLI support for some missing HTTP APIs
### What changes were proposed in this pull request?
Adding support for `removeWorkersUnavailabeInfo`, `getDecommissioningWorkers` and `runIsDecommissioning`. This was missed in the last PR.

### Why are the changes needed?
see above

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
added unit tests

Closes #2735 from akpatnam25/CELEBORN-1593.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-18 15:40:02 +08:00
SteNicholas
9621d1150d [CELEBORN-1572][FOLLOWUP] master --send-worker-event should support RECOMMISSION and NONE event type
### What changes were proposed in this pull request?

`master --send-worker-event` should support `RECOMMISSION` and `NONE` event type.

### Why are the changes needed?

Legal event types of `/api/v1/workers/events` are `IMMEDIATELY`, `DECOMMISSION`, `DECOMMISSION_THEN_IDLE`, `GRACEFUL`, `RECOMMISSION`, `NONE`. Therefore, `master --send-worker-event` should also support `RECOMMISSION` and `NONE` event type.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`TestCelebornCliCommands#master --send-worker-event`

Closes #2734 from SteNicholas/CELEBORN-1572.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-13 21:32:04 +08:00
Aravind Patnam
cc26131f88 [CELEBORN-1572] Celeborn CLI initial REST API support
### What changes were proposed in this pull request?
Introducing the Celeborn CLI (based on this [CPIP](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI)). For the first iteration, adding support for querying the existing REST api endpoints.
After this will add a layer for external cluster manager support. Further improvements are needed such as pretty print, which can be added in subsequent PRs.

### Why are the changes needed?
see [CPIP](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI)

### Does this PR introduce _any_ user-facing change?
yes, new CLI tool.

### How was this patch tested?
added UTs and also tested internally.

Closes #2699 from akpatnam25/cli-CELEBORN-1572.

Lead-authored-by: Aravind Patnam <apatnam@linkedin.com>
Co-authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2024-09-05 11:15:16 -05:00