Commit Graph

9 Commits

Author SHA1 Message Date
Wang, Fei
59163c2a23 [CELEBORN-1745] Remove application top disk usage code
### What changes were proposed in this pull request?
Remove the code for app top disk usage both in master and worker end.

Prefer to use below prometheus expr to figure out the top app usages.
```
topk(50, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId!=""}))
```

### Why are the changes needed?
To address comments: https://github.com/apache/celeborn/pull/2947#issuecomment-2499564978

> Due to the application dimension resource consumption, this feature should be included in the deprecated features. Maybe you can remove the codes for application top disk usage.

### Does this PR introduce _any_ user-facing change?

Yes, remove the app top disk usage api.

### How was this patch tested?
GA.

Closes #2949 from turboFei/remove_app_top_usage.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-28 10:55:34 +08:00
Wang, Fei
be4c02e6d0 [CELEBORN-1601][FOLLOWUP] Refine the RESTful apis for revise lost shuffles
### What changes were proposed in this pull request?

1. `GET /api/v1/applications/deleteApps`  -> `DELETE /api/v1/applications`

2. `GET /api/v1/applications/reviseLostShuffles`  -> `POST /api/v1/applications/revise_lost_shuffles`

### Why are the changes needed?
Followup for https://github.com/apache/celeborn/pull/2746

### Does this PR introduce _any_ user-facing change?

No, these APIs has not been released yet.

### How was this patch tested?
GA.

Closes #2892 from turboFei/delete_app.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-11-12 10:11:25 +08:00
Wang, Fei
7dc72b35e7 [CELEBORN-1477][FOLLOWUP] Remove scala binary version from openapi-client artifactId
### What changes were proposed in this pull request?

1. remove scala binary version from the openapi-client artifactId.
2. skip openapi-client doc compile, it was missed in https://github.com/apache/celeborn/pull/2641

### Why are the changes needed?

Because the openapi-client is a pure java module.

### Does this PR introduce _any_ user-facing change?

No, it has not been released.

### How was this patch tested?
GA.

Closes #2861 from turboFei/remove_Scala.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-01 15:08:00 +08:00
mingji
df01fadc9f
[CELEBORN-1601] Support revise lost shuffles
### What changes were proposed in this pull request?
To support revising lost shuffle IDs in a long-running job such as flink batch jobs.

### Why are the changes needed?
1. To support revise lost shuffles.
2. To add an HTTP endpoint to revise lost shuffles manually.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster tests.

Closes #2746 from FMX/b1600.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-10-21 16:44:37 +08:00
Aravind Patnam
fe01bac276 [CELEBORN-1599] Container Info REST API
### What changes were proposed in this pull request?
Adding REST api and cli for container info. User can configure this api to be based on whichever cluster manager they are using.

### Why are the changes needed?
see above

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added UTs

Closes #2758 from akpatnam25/CELEBORN-1599.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-17 17:44:08 +08:00
SteNicholas
5dea9f13d2
[CELEBORN-1593][FOLLOWUP] Support master --show-manual-excluded-workers command to show manual excluded workers
### What changes were proposed in this pull request?

Support `master --show-manual-excluded-workers` command to show manual excluded workers.

### Why are the changes needed?

`/api/v1/workers` returns the manual excluded workers, which could also be supported in master command.

### Does this PR introduce _any_ user-facing change?

Introduce `master --show-manual-excluded-workers` command.

### How was this patch tested?

`TestCelebornCliCommands#master --show-manual-excluded-workers`

Closes #2790 from SteNicholas/CELEBORN-1593.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-10 20:16:42 +08:00
Aravind Patnam
c4c32995d3 [CELEBORN-1593][FOLLOWUP] Add CLI support for some missing HTTP APIs
### What changes were proposed in this pull request?
Adding support for `removeWorkersUnavailabeInfo`, `getDecommissioningWorkers` and `runIsDecommissioning`. This was missed in the last PR.

### Why are the changes needed?
see above

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
added unit tests

Closes #2735 from akpatnam25/CELEBORN-1593.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-18 15:40:02 +08:00
SteNicholas
9621d1150d [CELEBORN-1572][FOLLOWUP] master --send-worker-event should support RECOMMISSION and NONE event type
### What changes were proposed in this pull request?

`master --send-worker-event` should support `RECOMMISSION` and `NONE` event type.

### Why are the changes needed?

Legal event types of `/api/v1/workers/events` are `IMMEDIATELY`, `DECOMMISSION`, `DECOMMISSION_THEN_IDLE`, `GRACEFUL`, `RECOMMISSION`, `NONE`. Therefore, `master --send-worker-event` should also support `RECOMMISSION` and `NONE` event type.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`TestCelebornCliCommands#master --send-worker-event`

Closes #2734 from SteNicholas/CELEBORN-1572.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-13 21:32:04 +08:00
Aravind Patnam
cc26131f88 [CELEBORN-1572] Celeborn CLI initial REST API support
### What changes were proposed in this pull request?
Introducing the Celeborn CLI (based on this [CPIP](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI)). For the first iteration, adding support for querying the existing REST api endpoints.
After this will add a layer for external cluster manager support. Further improvements are needed such as pretty print, which can be added in subsequent PRs.

### Why are the changes needed?
see [CPIP](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI)

### Does this PR introduce _any_ user-facing change?
yes, new CLI tool.

### How was this patch tested?
added UTs and also tested internally.

Closes #2699 from akpatnam25/cli-CELEBORN-1572.

Lead-authored-by: Aravind Patnam <apatnam@linkedin.com>
Co-authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2024-09-05 11:15:16 -05:00