Commit Graph

25 Commits

Author SHA1 Message Date
Wang, Fei
b7e3eaa46d [CELEBORN-1477][FOLLOWUP] Minor fix for v1 RESTful apis before release
### What changes were proposed in this pull request?

Minor fix the v1 RESTful apis before 0.6.0 release.

1. update  the API description to use UPPER case worker EventType
2. `subResourceConsumption`  => `subResourceConsumptions`.

### Why are the changes needed?
1. With https://github.com/apache/celeborn/pull/2754, the openapi-sdk works well. but for the RESTful call without SDK, the worker eventType is still case sensitive, might be caused by the jersey issue mentioned in https://github.com/eclipse-ee4j/jersey/issues/5288. So, In this PR, I change the description in the swagger for user guidance.
<img width="1524" alt="image" src="https://github.com/user-attachments/assets/70e4f239-dc36-47bc-902e-5340986f014a" />

2. rename `subResourceConsumption`  to `subResourceConsumptions`.

### Does this PR introduce _any_ user-facing change?
No, the api has not been released.

### How was this patch tested?
GA.

Closes #3023 from turboFei/restful_minor_fix.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2025-01-02 23:00:15 +08:00
Wang, Fei
27e34ecad0 [CELEBORN-1797] Support to adjust the logger level with RESTful API during runtime
### What changes were proposed in this pull request?

Support to adjust the logger level during runtime without restarting the server.

### Why are the changes needed?
It is useful for debug, likes hadoop daemonlog command: https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-common/CommandsManual.html#daemonlog

### Does this PR introduce _any_ user-facing change?
Yes, new RESTful api.

### How was this patch tested?

GA.
<img width="1430" alt="image" src="https://github.com/user-attachments/assets/9d974fd9-21f3-429a-a35f-e15662aa75ac" />
<img width="1428" alt="image" src="https://github.com/user-attachments/assets/ca32b596-12a1-4038-9e1b-4fdc6a377b54" />

<img width="1255" alt="image" src="https://github.com/user-attachments/assets/5c399a73-9f53-43a8-b337-5a79621abea4" />
<img width="1244" alt="image" src="https://github.com/user-attachments/assets/16dc9ede-01bb-4e38-80fe-acb044ae6cc7" />

Closes #3022 from turboFei/log_level.

Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-12-24 11:24:30 +08:00
Wang, Fei
878a83cfa7 [CELEBORN-1750] Return struct worker resource consumption information with RESTful api
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

This information is useful for the automation tool integration.

For our automation tool, it query all the worker status periodically by calling master `/api/v1/worker`.

In decommission process, when the worker is in IDLE state, we need to check whether there is still unreleased shuffle data on this worker so that we can shutdown this node without user impaction.

Before, I have to call the worker `/ap1/v1/shuffles` to check that.

It is better that we can get all the information from celeborn master end, because master is HA enabled and always reachable.

So in this PR, it returns the struct resource consumption for automation tool integration.

### Does this PR introduce _any_ user-facing change?

No, this RESTful api has not been released.

### How was this patch tested?
GA.

Closes #2955 from turboFei/worker_info_object.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2024-12-01 19:58:01 -08:00
Wang, Fei
59163c2a23 [CELEBORN-1745] Remove application top disk usage code
### What changes were proposed in this pull request?
Remove the code for app top disk usage both in master and worker end.

Prefer to use below prometheus expr to figure out the top app usages.
```
topk(50, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId!=""}))
```

### Why are the changes needed?
To address comments: https://github.com/apache/celeborn/pull/2947#issuecomment-2499564978

> Due to the application dimension resource consumption, this feature should be included in the deprecated features. Maybe you can remove the codes for application top disk usage.

### Does this PR introduce _any_ user-facing change?

Yes, remove the app top disk usage api.

### How was this patch tested?
GA.

Closes #2949 from turboFei/remove_app_top_usage.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-28 10:55:34 +08:00
Wang, Fei
be4c02e6d0 [CELEBORN-1601][FOLLOWUP] Refine the RESTful apis for revise lost shuffles
### What changes were proposed in this pull request?

1. `GET /api/v1/applications/deleteApps`  -> `DELETE /api/v1/applications`

2. `GET /api/v1/applications/reviseLostShuffles`  -> `POST /api/v1/applications/revise_lost_shuffles`

### Why are the changes needed?
Followup for https://github.com/apache/celeborn/pull/2746

### Does this PR introduce _any_ user-facing change?

No, these APIs has not been released yet.

### How was this patch tested?
GA.

Closes #2892 from turboFei/delete_app.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-11-12 10:11:25 +08:00
Wang, Fei
4545cdc401
[CELEBORN-1699] Add GA to check the openapi codegen
### What changes were proposed in this pull request?
Check the consistency between openapi spec and generated openapi-client code in GA.
Fix the ThreadStack model.

### Why are the changes needed?
Address comments: https://github.com/apache/celeborn/pull/2892#issuecomment-2463170494

Followup for https://github.com/apache/celeborn/pull/2888
### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

GA.

It works:
https://github.com/apache/celeborn/actions/runs/11733693060/job/32688436233?pr=2893

<img width="1059" alt="image" src="https://github.com/user-attachments/assets/84682976-1b7d-42e0-9b62-2966f3e952d7">

After ThreadStack fixed:
<img width="1368" alt="image" src="https://github.com/user-attachments/assets/14a2a08f-dbc2-409c-a4ed-fbfee82e50b5">

Closes #2893 from turboFei/diff_openapi.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-11-08 10:57:46 +08:00
SteNicholas
54bbd72bd2 [CELEBORN-1697] Improve ThreadStackTrace for thread dump
### What changes were proposed in this pull request?

Improve `ThreadStackTrace` with `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` for thread dump.

### Why are the changes needed?

ThreadStackTrace does not support stack trace including `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` at present. It's recommend to improve `ThreadStackTrace` of thread dump for more details of thread stack trace.

### Does this PR introduce _any_ user-facing change?

The response of `ThreadStack` in `/api/v1/thread_dump` adds `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` fields.

Cherry pick:

- https://github.com/apache/spark/pull/42575
- https://github.com/apache/spark/pull/43095

### How was this patch tested?

`ApiV1BaseResourceSuite#thread_dump`

Closes #2888 from SteNicholas/CELEBORN-1697.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-07 20:36:58 +08:00
Wang, Fei
7dc72b35e7 [CELEBORN-1477][FOLLOWUP] Remove scala binary version from openapi-client artifactId
### What changes were proposed in this pull request?

1. remove scala binary version from the openapi-client artifactId.
2. skip openapi-client doc compile, it was missed in https://github.com/apache/celeborn/pull/2641

### Why are the changes needed?

Because the openapi-client is a pure java module.

### Does this PR introduce _any_ user-facing change?

No, it has not been released.

### How was this patch tested?
GA.

Closes #2861 from turboFei/remove_Scala.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-11-01 15:08:00 +08:00
Wang, Fei
216152d038 [CELEBORN-1632] Support to apply ratis local raft_meta_conf command with RESTful api
### What changes were proposed in this pull request?
Sub-task of CELEBORN-1628.

Support to apply ratis local raft_meta_conf with RESTful api.

See https://celeborn.apache.org/docs/latest/celeborn_ratis_shell/#local-raftmetaconf
```
$ celeborn-ratis sh local raftMetaConf -peers <[P0_ID|]P0_HOST:P0_PORT,[P1_ID|]P1_HOST:P1_PORT,[P2_ID|]P2_HOST:P2_PORT> -path <PARENT_PATH_OF_RAFT_META_CONF>
```

The implementation is same with e96ed1a338/ratis-shell/src/main/java/org/apache/ratis/shell/cli/sh/local/RaftMetaConfCommand.java (L122-L133)

### Why are the changes needed?

We have implemented the RESTful implementation for all the others ratis-shell command.

<img width="1219" alt="image" src="https://github.com/user-attachments/assets/4367ddbd-3c55-449a-a1bc-75d6c18e8918">

| Ratis Shell               | RESTful api                        |
|----------------------|---------------------------------|
| election transfer    | `/ratis/election/transfer`      |
| election stepDown    | `/ratis/election/step_down`     |
| election pause       | `/ratis/election/pause`         |
| election resume      | `/ratis/election/resume`        |
| group info           | `/masters`                      |
| peer add             | `/ratis/peer/add`               |
| peer remove          | `/ratis/peer/remove`            |
| peer setPriority     | `/ratis/peer/set_priority`      |
| snapshot create      | `/ratis/snapshot/create`        |

And the local raftMetaConf command is the last one.

I closed the ticket CELEBORN-1632 before, I thought it is a local command and wonder whether it is necessary to implement it with RESTful api.

But we have implemented all the others, so I decide to implement it as well.

### Does this PR introduce _any_ user-facing change?

A new API.

The implementation is same with e96ed1a338/ratis-shell/src/main/java/org/apache/ratis/shell/cli/sh/local/RaftMetaConfCommand.java (L122-L133)

### How was this patch tested?
![image](https://github.com/user-attachments/assets/088d8523-e5f5-4546-9159-e12191fd8a29)
![image](https://github.com/user-attachments/assets/ce9c4284-fd61-45de-93e7-d38e3b6afac9)
<img width="960" alt="image" src="https://github.com/user-attachments/assets/b302a680-baea-4709-b77f-a2b1946b8dff">

<img width="1471" alt="image" src="https://github.com/user-attachments/assets/4bf090ba-c6f4-4f49-aa57-8dd2c897ff30">
<img width="871" alt="image" src="https://github.com/user-attachments/assets/9959072c-5e96-48f5-911e-546c05a0c443">

Closes #2829 from turboFei/local_raft_conf.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-24 16:09:18 +08:00
mingji
df01fadc9f
[CELEBORN-1601] Support revise lost shuffles
### What changes were proposed in this pull request?
To support revising lost shuffle IDs in a long-running job such as flink batch jobs.

### Why are the changes needed?
1. To support revise lost shuffles.
2. To add an HTTP endpoint to revise lost shuffles manually.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster tests.

Closes #2746 from FMX/b1600.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-10-21 16:44:37 +08:00
Aravind Patnam
fe01bac276 [CELEBORN-1599] Container Info REST API
### What changes were proposed in this pull request?
Adding REST api and cli for container info. User can configure this api to be based on whichever cluster manager they are using.

### Why are the changes needed?
see above

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added UTs

Closes #2758 from akpatnam25/CELEBORN-1599.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-17 17:44:08 +08:00
Wang, Fei
568e335ada [CELEBORN-1630] Support to apply ratis peer operation with RESTful api
### What changes were proposed in this pull request?

Sub task of CELEBORN-1628.

Mapping for below commands:
```
$ celeborn-ratis sh peer add -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> [-groupid <RAFT_GROUP_ID>] -address <P4_HOST:P4_PORT,...,PN_HOST:PN_PORT>
```

```
$ celeborn-ratis sh peer remove -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> [-groupid <RAFT_GROUP_ID>] -address <P0_HOST:P0_PORT,...>
```

```
$ celeborn-ratis sh peer setPriority -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> [-groupid <RAFT_GROUP_ID>] -addressPriority <P0_HOST:P0_PORT|PRIORITY>
```

### Why are the changes needed?

It is more convenient to apply the ratis operation with RESTful api.

### Does this PR introduce _any_ user-facing change?
No, new api.

### How was this patch tested?
Integration testing. Will provide the screenshot

Add:
<img width="1619" alt="image" src="https://github.com/user-attachments/assets/ab4e24bb-3a99-40da-9972-231c9dc7c46c">

Remove:
<img width="1654" alt="image" src="https://github.com/user-attachments/assets/71133818-3259-47f5-be75-0715efe97361">

Set peer priority:
<img width="1510" alt="image" src="https://github.com/user-attachments/assets/e31b3701-71c1-46fd-872b-5227fb89f6fe">

Closes #2804 from turboFei/peer_raft.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-17 11:01:12 +08:00
Wang, Fei
2c0fdca04f [CELEBORN-1631] Support to apply ratis snapshot operation with RESTful api
### What changes were proposed in this pull request?
Sub task of CELEBORN-1628

### Why are the changes needed?

Mapping for ratis-shell command:

```
$ celeborn-ratis sh snapshot create -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> -peerId <peerId0> [-groupid <RAFT_GROUP_ID>]
```
It is helpful for automation integration.

### Does this PR introduce _any_ user-facing change?

No, it is a new API.

### How was this patch tested?
Integration testing.

<img width="1259" alt="image" src="https://github.com/user-attachments/assets/1f74a899-17f3-41b3-911a-36374bf0fd0b">

Closes #2803 from turboFei/snapshot_create.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-15 21:45:36 +08:00
Wang, Fei
111781f8f1 [CELEBORN-1633] Return more raft group information
### What changes were proposed in this pull request?

As title, sub task of [CELEBORN-1628](https://issues.apache.org/jira/browse/CELEBORN-1628)

### Why are the changes needed?

To return more peer info, likes priority.

and also return the group log info.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
<img width="1607" alt="image" src="https://github.com/user-attachments/assets/4f5cef5a-d42b-47be-888f-46ad05f7105d">

Closes #2780 from turboFei/more_group_info.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-11 10:26:35 +08:00
Wang, Fei
ab12b34a1d [CELEBORN-1629] Support to apply ratis election operation with RESTful api
### What changes were proposed in this pull request?

As title, sub task of [CELEBORN-1628](https://issues.apache.org/jira/browse/CELEBORN-1628)
### Why are the changes needed?

It is more friendly for customer experience and necessary for automation integration.

<img width="1325" alt="image" src="https://github.com/user-attachments/assets/b9270bd5-75ee-41a9-b7aa-1db1270b4edb">

Before restart/recreate the celeborn master pod
- call the  ratis/snapshot/create to create the snapshot(will raise another PR for this)
- if need to failover
    - call the ratis/election/pause to pause the election
    - call the ratis/election/step_down or ratis/election/transfer to transfer the leadership
    - call the ratis/election/resume to resume the election finally

### Does this PR introduce _any_ user-facing change?
Introduce new RESTful apis.

### How was this patch tested?

election transfer:
<img width="1614" alt="image" src="https://github.com/user-attachments/assets/42e39478-e6c7-4f8a-91bb-a6b35e2098c5">

election pause and resume:

<img width="727" alt="image" src="https://github.com/user-attachments/assets/5ead4739-d45f-47d4-a951-9784a6495add">

election step down:
<img width="1719" alt="image" src="https://github.com/user-attachments/assets/ebb3b722-147e-4db2-971d-17f4458b98ed">

Closes #2779 from turboFei/ratis_rest.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-11 10:24:44 +08:00
Wang, Fei
909d6c3b9c [CELEBORN-1477][FOLLOWUP] Upgrade openapi-generator to 7.8.0
### What changes were proposed in this pull request?
This pr is a followup for https://github.com/apache/celeborn/pull/2641

In above PR, I upgrade the version to 7.7.0, and there were two generated java files not with apache licenses.

And then I raised a PR in https://github.com/OpenAPITools/openapi-generator/pull/19273 to followup it, and it is released in https://github.com/OpenAPITools/openapi-generator/releases/tag/v7.8.0.

### Why are the changes needed?

Upgrade to the latest openapi-generator version to resolve the unlicensed java files.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing GA.

Closes #2695 from turboFei/openapi_upgrade.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-24 16:02:09 +08:00
Wang, Fei
1596cffb39 [CELEBORN-1607] Enable useEnumCaseInsensitive for openapi-generator
### What changes were proposed in this pull request?
Enable `useEnumCaseInsensitive` for openapi-generator.
And then in celeborn server end, the enum will be mapped to celeborn internal WorkerEventType.

### Why are the changes needed?

I met exception when sending worker event with openapi sdk.
```
Exception in thread "main" ApiException{code=400, responseHeaders={Server=[Jetty(9.4.52.v20230823)], Content-Length=[491], Date=[Fri, 20 Sep 2024 23:50:27 GMT], Content-Type=[text/plain]}, responseBody='Cannot deserialize value of type `org.apache.celeborn.rest.v1.model.SendWorkerEventRequest$EventTypeEnum` from String "DecommissionThenIdle": not one of the values accepted for Enum class: [DECOMMISSION_THEN_IDLE, GRACEFUL, NONE, DECOMMISSION, IMMEDIATELY, RECOMMISSION]
 at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 14] (through reference chain: org.apache.celeborn.rest.v1.model.SendWorkerEventRequest["eventType"])'}
    at org.apache.celeborn.rest.v1.master.invoker.ApiClient.processResponse(ApiClient.java:913)
    at org.apache.celeborn.rest.v1.master.invoker.ApiClient.invokeAPI(ApiClient.java:1000)
    at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:378)
    at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:334)
    at org.example.Main.main(Main.java:22)

```

The testing code to re-produce:
```
package org.example;

import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;
import org.apache.celeborn.rest.v1.model.ExcludeWorkerRequest;
import org.apache.celeborn.rest.v1.model.SendWorkerEventRequest;
import org.apache.celeborn.rest.v1.model.WorkerId;

public class Main {
    public static void main(String[] args) throws Exception {

        String cmUrl = "http://localhost:9098";
        WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
        workerApi.excludeWorker(new ExcludeWorkerRequest()
                .addAddItem(new WorkerId()
                        .host("localhost")
                        .rpcPort(1)
                        .pushPort(2)
                        .fetchPort(3)
                        .replicatePort(4)));
        workerApi.sendWorkerEvent(new SendWorkerEventRequest()
                        .addWorkersItem(new WorkerId()
                                .host("127.0.0.1")
                                .rpcPort(56116)
                                .pushPort(56117)
                                .fetchPort(56119)
                                .replicatePort(56118))
                .eventType(SendWorkerEventRequest.EventTypeEnum.DECOMMISSION_THEN_IDLE));
    }
}
```

Seems because for the EventTypeEnum, the name and value not the same and then cause this issue.

Not sure why the UT passed, but the integration testing failed.

For EventTypeEnum, because its value is case sensitive, so we meet this issue.

8734d16638/openapi/openapi-client/src/main/java/org/apache/celeborn/rest/v1/model/SendWorkerEventRequest.java (L47-L83)

Related issue in jersey end I think, https://github.com/eclipse-ee4j/jersey/issues/5288

In this PR, `useEnumCaseInsensitive` is enabled for openapi-generator.

### Does this PR introduce _any_ user-facing change?
No, there is not user facing change and this SDK has not been released yet.

### How was this patch tested?
Existing UT and Integration testing.
<img width="1265" alt="image" src="https://github.com/user-attachments/assets/6a34a0dd-c474-4e8d-b372-19b0fda94972">

Closes #2754 from turboFei/eventTypeEnumMapping.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-23 20:43:04 +08:00
sychen
589100ea91 [CELEBORN-1600] Enable check server dependencies
### What changes were proposed in this pull request?

### Why are the changes needed?
Server module missing checks.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2742 from cxzl25/check_server_deps.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-20 15:14:56 +08:00
SteNicholas
baef31abb8 [CELEBORN-1477][FOLLOWUP] /api/v1/workers/events should support None eventType to align /sendWorkerEvent
### What changes were proposed in this pull request?

`/api/v1/workers/events` should support `None` `eventType` to align `/sendWorkerEvent`.

### Why are the changes needed?

Legal event types of `/sendWorkerEvent` are `None`, `Immediately`, `Decommission`, `DecommissionThenIdle`, `Graceful`, `Recommission`. But `/api/v1/workers/events` does not support `eventType` with `None` type.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`ApiV1MasterResourceSuite#worker resource`

Closes #2732 from SteNicholas/CELEBORN-1477.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-09-14 11:14:23 +08:00
Wang, Fei
ae41cb5ade [CELEBORN-1537] Support to remove workers unavailable info with RESTful api
### What changes were proposed in this pull request?
In [CELEBORN-1535](https://issues.apache.org/jira/browse/CELEBORN-1535), we support to disable master workerUnavilableInfo expiration.

 In this PR,  a new RestAPI  introduced for manually remove unavailable workers. Then it can be used on demand.

### Why are the changes needed?
To cleanup the works unavailable info on demand manually if we disable the expiration.

### Does this PR introduce _any_ user-facing change?
Yes, a new RESTful API.

### How was this patch tested?
UT.

Closes #2658 from turboFei/support_cleanup.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-19 11:10:33 +08:00
Wang, Fei
1515ed38b2 [CELEBORN-1477] Using openapi-generator apache-httpclient library instead of jersey2
### What changes were proposed in this pull request?
We used `jersey2` library for celeborn-openapi-client before, and I found that there is dependencies lack issue for shaded celeborn-openapi-client.
I tried to raise a [PR #2640] to fix it, but seems It is difficult to maintain the dependencies transition from jersey dependencies.

And I received the suggestion from pan to migrate the library from jersey2 to `apache-httpclient`.

FYI: for https://openapi-generator.tech/docs/generators/java/

<img width="500" alt="image" src="https://github.com/user-attachments/assets/d102a7c9-46cd-4fd7-a2a0-7396a815776d">

To leverage the latest openapi-generator plugin, I upgrade the openapi-generator version to latest 7.7.0 and it requires JDK11+.
Due celeborn does not drop the Java8 support so far, so I include the generated code into repo and add user guide for re-generation.

### Why are the changes needed?

To fix dependencies leak issue and maintain the dependencies easily.

### Does this PR introduce _any_ user-facing change?

No, this SDK has not been released, so no user-facing change.

### How was this patch tested?

Testing with sample maven project.

pom.xml:
```
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>test_openapi</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.celeborn</groupId>
            <artifactId>celeborn-openapi-client_2.12</artifactId>
            <version>0.6.0-SNAPSHOT</version>
        </dependency>
    </dependencies>
</project>
```

Testing code:
```
package org.example;

import org.apache.celeborn.rest.v1.master.MasterApi;
import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;

public class Main {
    public static void main(String[] args) throws Exception {

        String cmUrl = "http://***:9098";
        MasterApi masterApi  = new MasterApi(new ApiClient().setBasePath(cmUrl));
        System.out.println(masterApi.getMasterGroupInfo().getLeader().getAddress().split(":")[0]);
        WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
        System.out.println(workerApi.getWorkers());
        System.out.println(workerApi.getWorkerEvents());
    }
}
```

```
java -Dfile.encoding=UTF-8 -classpath /Users/fwang12/todo/test_openapi/target/classes:/Users/fwang12/todo/celeborn/openapi/openapi-client/target/celeborn-openapi-client_2.12-0.6.0-SNAPSHOT.jar org.example.Main
```

<img width="1727" alt="image" src="https://github.com/user-attachments/assets/2da8b126-be96-4c37-9a33-ba196024f2ba">

Closes #2641 from turboFei/appache_httpclient.

Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-31 15:02:41 +08:00
zhaohehuhu
7a596bbed1 [CELEBORN-1469] Support writing shuffle data to OSS(S3 only)
### What changes were proposed in this pull request?

as title

### Why are the changes needed?

Now, Celeborn doesn't support sinking shuffle data directly to Amazon S3, which could be a limitation when we're trying to move on-premises servers to AWS and use S3 as a data sink for shuffled data.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Closes #2579 from zhaohehuhu/dev-0619.

Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-07-24 11:59:15 +08:00
Fei Wang
cdbf48ee37
[CELEBORN-1477][FOLLOWUP] Shade the dependencies for openapi-client
### What changes were proposed in this pull request?

Shade the dependencies for openapi-client.

### Why are the changes needed?

To prevent dependency conflicts when the openapi-client involved in client.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the UT.

Closes #2615 from turboFei/openapi_client_shade.

Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-16 14:37:12 +08:00
Wang, Fei
0b8c9fdd4c [CELEBORN-1505] Algin the celeborn server jackson dependency versions
### What changes were proposed in this pull request?

Now there are three different jackson versions in the server dependency list.

It is better to align them.

### Why are the changes needed?
To align the dependency versions and reduce the conflicts in the future.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Pass the GA.

Closes #2620 from turboFei/align_jackson.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-15 11:00:23 +08:00
Fei Wang
d698a69edc
[CELEBORN-1477][CIP-9] Refine the celeborn RESTful APIs
### What changes were proposed in this pull request?

This PR is for [CIP-9 Refine the celeborn RESTful APIs](https://docs.google.com/document/d/1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs/edit?usp=sharing).

We leverage [openapi-generator](https://github.com/OpenAPITools/openapi-generator) to generate the client and model code.

### Why are the changes needed?

Celeborn has implemented RESTful APIs for monitoring and administrative operations on both master and worker endpoints. These APIs enable tasks such as configuration checks, status viewing of master/worker nodes, worker decommissioning/recommissioning, and more. They provide crucial insights and support for DevOps.
The primary concern with the existing API is the response content type, which is `text/plain` rather than the more widely accepted `application/json`. This mismatch makes integration with DevOps tools challenging, as these tools typically require JSON-formatted responses for seamless parsing and automation.
And I also saw the need for REST API evolution in[ Apache Celeborn CLI Proposal](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI).

### Does this PR introduce _any_ user-facing change?
This pr introduce  a new API namespace: `/api/v1`. This approach allows us to maintain the current API for compatibility while offering an improved version.

### How was this patch tested?
UT.

Closes #2599 from turboFei/cip_9_openapi.

Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-11 10:57:00 +08:00