### What changes were proposed in this pull request?
Minor fix the v1 RESTful apis before 0.6.0 release.
1. update the API description to use UPPER case worker EventType
2. `subResourceConsumption` => `subResourceConsumptions`.
### Why are the changes needed?
1. With https://github.com/apache/celeborn/pull/2754, the openapi-sdk works well. but for the RESTful call without SDK, the worker eventType is still case sensitive, might be caused by the jersey issue mentioned in https://github.com/eclipse-ee4j/jersey/issues/5288. So, In this PR, I change the description in the swagger for user guidance.
<img width="1524" alt="image" src="https://github.com/user-attachments/assets/70e4f239-dc36-47bc-902e-5340986f014a" />
2. rename `subResourceConsumption` to `subResourceConsumptions`.
### Does this PR introduce _any_ user-facing change?
No, the api has not been released.
### How was this patch tested?
GA.
Closes#3023 from turboFei/restful_minor_fix.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
This information is useful for the automation tool integration.
For our automation tool, it query all the worker status periodically by calling master `/api/v1/worker`.
In decommission process, when the worker is in IDLE state, we need to check whether there is still unreleased shuffle data on this worker so that we can shutdown this node without user impaction.
Before, I have to call the worker `/ap1/v1/shuffles` to check that.
It is better that we can get all the information from celeborn master end, because master is HA enabled and always reachable.
So in this PR, it returns the struct resource consumption for automation tool integration.
### Does this PR introduce _any_ user-facing change?
No, this RESTful api has not been released.
### How was this patch tested?
GA.
Closes#2955 from turboFei/worker_info_object.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
Remove the code for app top disk usage both in master and worker end.
Prefer to use below prometheus expr to figure out the top app usages.
```
topk(50, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId!=""}))
```
### Why are the changes needed?
To address comments: https://github.com/apache/celeborn/pull/2947#issuecomment-2499564978
> Due to the application dimension resource consumption, this feature should be included in the deprecated features. Maybe you can remove the codes for application top disk usage.
### Does this PR introduce _any_ user-facing change?
Yes, remove the app top disk usage api.
### How was this patch tested?
GA.
Closes#2949 from turboFei/remove_app_top_usage.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
1. `GET /api/v1/applications/deleteApps` -> `DELETE /api/v1/applications`
2. `GET /api/v1/applications/reviseLostShuffles` -> `POST /api/v1/applications/revise_lost_shuffles`
### Why are the changes needed?
Followup for https://github.com/apache/celeborn/pull/2746
### Does this PR introduce _any_ user-facing change?
No, these APIs has not been released yet.
### How was this patch tested?
GA.
Closes#2892 from turboFei/delete_app.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve `ThreadStackTrace` with `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` for thread dump.
### Why are the changes needed?
ThreadStackTrace does not support stack trace including `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` at present. It's recommend to improve `ThreadStackTrace` of thread dump for more details of thread stack trace.
### Does this PR introduce _any_ user-facing change?
The response of `ThreadStack` in `/api/v1/thread_dump` adds `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` fields.
Cherry pick:
- https://github.com/apache/spark/pull/42575
- https://github.com/apache/spark/pull/43095
### How was this patch tested?
`ApiV1BaseResourceSuite#thread_dump`
Closes#2888 from SteNicholas/CELEBORN-1697.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
1. remove scala binary version from the openapi-client artifactId.
2. skip openapi-client doc compile, it was missed in https://github.com/apache/celeborn/pull/2641
### Why are the changes needed?
Because the openapi-client is a pure java module.
### Does this PR introduce _any_ user-facing change?
No, it has not been released.
### How was this patch tested?
GA.
Closes#2861 from turboFei/remove_Scala.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
To support revising lost shuffle IDs in a long-running job such as flink batch jobs.
### Why are the changes needed?
1. To support revise lost shuffles.
2. To add an HTTP endpoint to revise lost shuffles manually.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Cluster tests.
Closes#2746 from FMX/b1600.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Adding REST api and cli for container info. User can configure this api to be based on whichever cluster manager they are using.
### Why are the changes needed?
see above
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
added UTs
Closes#2758 from akpatnam25/CELEBORN-1599.
Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Sub task of CELEBORN-1628.
Mapping for below commands:
```
$ celeborn-ratis sh peer add -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> [-groupid <RAFT_GROUP_ID>] -address <P4_HOST:P4_PORT,...,PN_HOST:PN_PORT>
```
```
$ celeborn-ratis sh peer remove -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> [-groupid <RAFT_GROUP_ID>] -address <P0_HOST:P0_PORT,...>
```
```
$ celeborn-ratis sh peer setPriority -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> [-groupid <RAFT_GROUP_ID>] -addressPriority <P0_HOST:P0_PORT|PRIORITY>
```
### Why are the changes needed?
It is more convenient to apply the ratis operation with RESTful api.
### Does this PR introduce _any_ user-facing change?
No, new api.
### How was this patch tested?
Integration testing. Will provide the screenshot
Add:
<img width="1619" alt="image" src="https://github.com/user-attachments/assets/ab4e24bb-3a99-40da-9972-231c9dc7c46c">
Remove:
<img width="1654" alt="image" src="https://github.com/user-attachments/assets/71133818-3259-47f5-be75-0715efe97361">
Set peer priority:
<img width="1510" alt="image" src="https://github.com/user-attachments/assets/e31b3701-71c1-46fd-872b-5227fb89f6fe">
Closes#2804 from turboFei/peer_raft.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Sub task of CELEBORN-1628
### Why are the changes needed?
Mapping for ratis-shell command:
```
$ celeborn-ratis sh snapshot create -peers <P0_HOST:P0_PORT,P1_HOST:P1_PORT,P2_HOST:P2_PORT> -peerId <peerId0> [-groupid <RAFT_GROUP_ID>]
```
It is helpful for automation integration.
### Does this PR introduce _any_ user-facing change?
No, it is a new API.
### How was this patch tested?
Integration testing.
<img width="1259" alt="image" src="https://github.com/user-attachments/assets/1f74a899-17f3-41b3-911a-36374bf0fd0b">
Closes#2803 from turboFei/snapshot_create.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
As title, sub task of [CELEBORN-1628](https://issues.apache.org/jira/browse/CELEBORN-1628)
### Why are the changes needed?
To return more peer info, likes priority.
and also return the group log info.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
<img width="1607" alt="image" src="https://github.com/user-attachments/assets/4f5cef5a-d42b-47be-888f-46ad05f7105d">
Closes#2780 from turboFei/more_group_info.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
This pr is a followup for https://github.com/apache/celeborn/pull/2641
In above PR, I upgrade the version to 7.7.0, and there were two generated java files not with apache licenses.
And then I raised a PR in https://github.com/OpenAPITools/openapi-generator/pull/19273 to followup it, and it is released in https://github.com/OpenAPITools/openapi-generator/releases/tag/v7.8.0.
### Why are the changes needed?
Upgrade to the latest openapi-generator version to resolve the unlicensed java files.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing GA.
Closes#2695 from turboFei/openapi_upgrade.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Enable `useEnumCaseInsensitive` for openapi-generator.
And then in celeborn server end, the enum will be mapped to celeborn internal WorkerEventType.
### Why are the changes needed?
I met exception when sending worker event with openapi sdk.
```
Exception in thread "main" ApiException{code=400, responseHeaders={Server=[Jetty(9.4.52.v20230823)], Content-Length=[491], Date=[Fri, 20 Sep 2024 23:50:27 GMT], Content-Type=[text/plain]}, responseBody='Cannot deserialize value of type `org.apache.celeborn.rest.v1.model.SendWorkerEventRequest$EventTypeEnum` from String "DecommissionThenIdle": not one of the values accepted for Enum class: [DECOMMISSION_THEN_IDLE, GRACEFUL, NONE, DECOMMISSION, IMMEDIATELY, RECOMMISSION]
at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 14] (through reference chain: org.apache.celeborn.rest.v1.model.SendWorkerEventRequest["eventType"])'}
at org.apache.celeborn.rest.v1.master.invoker.ApiClient.processResponse(ApiClient.java:913)
at org.apache.celeborn.rest.v1.master.invoker.ApiClient.invokeAPI(ApiClient.java:1000)
at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:378)
at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:334)
at org.example.Main.main(Main.java:22)
```
The testing code to re-produce:
```
package org.example;
import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;
import org.apache.celeborn.rest.v1.model.ExcludeWorkerRequest;
import org.apache.celeborn.rest.v1.model.SendWorkerEventRequest;
import org.apache.celeborn.rest.v1.model.WorkerId;
public class Main {
public static void main(String[] args) throws Exception {
String cmUrl = "http://localhost:9098";
WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
workerApi.excludeWorker(new ExcludeWorkerRequest()
.addAddItem(new WorkerId()
.host("localhost")
.rpcPort(1)
.pushPort(2)
.fetchPort(3)
.replicatePort(4)));
workerApi.sendWorkerEvent(new SendWorkerEventRequest()
.addWorkersItem(new WorkerId()
.host("127.0.0.1")
.rpcPort(56116)
.pushPort(56117)
.fetchPort(56119)
.replicatePort(56118))
.eventType(SendWorkerEventRequest.EventTypeEnum.DECOMMISSION_THEN_IDLE));
}
}
```
Seems because for the EventTypeEnum, the name and value not the same and then cause this issue.
Not sure why the UT passed, but the integration testing failed.
For EventTypeEnum, because its value is case sensitive, so we meet this issue.
8734d16638/openapi/openapi-client/src/main/java/org/apache/celeborn/rest/v1/model/SendWorkerEventRequest.java (L47-L83)
Related issue in jersey end I think, https://github.com/eclipse-ee4j/jersey/issues/5288
In this PR, `useEnumCaseInsensitive` is enabled for openapi-generator.
### Does this PR introduce _any_ user-facing change?
No, there is not user facing change and this SDK has not been released yet.
### How was this patch tested?
Existing UT and Integration testing.
<img width="1265" alt="image" src="https://github.com/user-attachments/assets/6a34a0dd-c474-4e8d-b372-19b0fda94972">
Closes#2754 from turboFei/eventTypeEnumMapping.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
Server module missing checks.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA
Closes#2742 from cxzl25/check_server_deps.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
`/api/v1/workers/events` should support `None` `eventType` to align `/sendWorkerEvent`.
### Why are the changes needed?
Legal event types of `/sendWorkerEvent` are `None`, `Immediately`, `Decommission`, `DecommissionThenIdle`, `Graceful`, `Recommission`. But `/api/v1/workers/events` does not support `eventType` with `None` type.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`ApiV1MasterResourceSuite#worker resource`
Closes#2732 from SteNicholas/CELEBORN-1477.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
In [CELEBORN-1535](https://issues.apache.org/jira/browse/CELEBORN-1535), we support to disable master workerUnavilableInfo expiration.
In this PR, a new RestAPI introduced for manually remove unavailable workers. Then it can be used on demand.
### Why are the changes needed?
To cleanup the works unavailable info on demand manually if we disable the expiration.
### Does this PR introduce _any_ user-facing change?
Yes, a new RESTful API.
### How was this patch tested?
UT.
Closes#2658 from turboFei/support_cleanup.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
We used `jersey2` library for celeborn-openapi-client before, and I found that there is dependencies lack issue for shaded celeborn-openapi-client.
I tried to raise a [PR #2640] to fix it, but seems It is difficult to maintain the dependencies transition from jersey dependencies.
And I received the suggestion from pan to migrate the library from jersey2 to `apache-httpclient`.
FYI: for https://openapi-generator.tech/docs/generators/java/
<img width="500" alt="image" src="https://github.com/user-attachments/assets/d102a7c9-46cd-4fd7-a2a0-7396a815776d">
To leverage the latest openapi-generator plugin, I upgrade the openapi-generator version to latest 7.7.0 and it requires JDK11+.
Due celeborn does not drop the Java8 support so far, so I include the generated code into repo and add user guide for re-generation.
### Why are the changes needed?
To fix dependencies leak issue and maintain the dependencies easily.
### Does this PR introduce _any_ user-facing change?
No, this SDK has not been released, so no user-facing change.
### How was this patch tested?
Testing with sample maven project.
pom.xml:
```
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>test_openapi</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.celeborn</groupId>
<artifactId>celeborn-openapi-client_2.12</artifactId>
<version>0.6.0-SNAPSHOT</version>
</dependency>
</dependencies>
</project>
```
Testing code:
```
package org.example;
import org.apache.celeborn.rest.v1.master.MasterApi;
import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;
public class Main {
public static void main(String[] args) throws Exception {
String cmUrl = "http://***:9098";
MasterApi masterApi = new MasterApi(new ApiClient().setBasePath(cmUrl));
System.out.println(masterApi.getMasterGroupInfo().getLeader().getAddress().split(":")[0]);
WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
System.out.println(workerApi.getWorkers());
System.out.println(workerApi.getWorkerEvents());
}
}
```
```
java -Dfile.encoding=UTF-8 -classpath /Users/fwang12/todo/test_openapi/target/classes:/Users/fwang12/todo/celeborn/openapi/openapi-client/target/celeborn-openapi-client_2.12-0.6.0-SNAPSHOT.jar org.example.Main
```
<img width="1727" alt="image" src="https://github.com/user-attachments/assets/2da8b126-be96-4c37-9a33-ba196024f2ba">
Closes#2641 from turboFei/appache_httpclient.
Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
as title
### Why are the changes needed?
Now, Celeborn doesn't support sinking shuffle data directly to Amazon S3, which could be a limitation when we're trying to move on-premises servers to AWS and use S3 as a data sink for shuffled data.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closes#2579 from zhaohehuhu/dev-0619.
Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Shade the dependencies for openapi-client.
### Why are the changes needed?
To prevent dependency conflicts when the openapi-client involved in client.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the UT.
Closes#2615 from turboFei/openapi_client_shade.
Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Now there are three different jackson versions in the server dependency list.
It is better to align them.
### Why are the changes needed?
To align the dependency versions and reduce the conflicts in the future.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the GA.
Closes#2620 from turboFei/align_jackson.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
This PR is for [CIP-9 Refine the celeborn RESTful APIs](https://docs.google.com/document/d/1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs/edit?usp=sharing).
We leverage [openapi-generator](https://github.com/OpenAPITools/openapi-generator) to generate the client and model code.
### Why are the changes needed?
Celeborn has implemented RESTful APIs for monitoring and administrative operations on both master and worker endpoints. These APIs enable tasks such as configuration checks, status viewing of master/worker nodes, worker decommissioning/recommissioning, and more. They provide crucial insights and support for DevOps.
The primary concern with the existing API is the response content type, which is `text/plain` rather than the more widely accepted `application/json`. This mismatch makes integration with DevOps tools challenging, as these tools typically require JSON-formatted responses for seamless parsing and automation.
And I also saw the need for REST API evolution in[ Apache Celeborn CLI Proposal](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI).
### Does this PR introduce _any_ user-facing change?
This pr introduce a new API namespace: `/api/v1`. This approach allows us to maintain the current API for compatibility while offering an improved version.
### How was this patch tested?
UT.
Closes#2599 from turboFei/cip_9_openapi.
Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>