### What changes were proposed in this pull request?
Inspired by [FLINK-38038](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-38038?filter=allissues]), I used [Tongyi Lingma](https://lingma.aliyun.com/) and qwen3-thinking LLM to identify and fix some typo issues in the Celeborn codebase. For example:
- backLog → backlog
- won`t → won't
- can to be read → can be read
- mapDataPartition → mapPartitionData
- UserDefinePasswordAuthenticationProviderImpl → UserDefinedPasswordAuthenticationProviderImpl
### Why are the changes needed?
Remove typos to improve source code readability for users and ease development for developers.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Code and documentation cleanup does not require additional testing.
Closes#3356 from codenohup/fix-typo.
Authored-by: codenohup <huangxu.walker@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
1. Add a new sink and allow the user to store metrics to files.
2. Celeborn will scrape its metrics periodically to make sure that the metric data won't be too large to cause OOM.
### Why are the changes needed?
A long-running worker ran out of memory and found out that the metrics are huge in the heap dump.
As you can see below, the biggest object is the time metric queue, and I got 1.6 million records.
<img width="1516" alt="Screenshot 2025-06-24 at 09 59 30" src="https://github.com/user-attachments/assets/691c7bc2-b974-4cc0-8d5a-bf626ab903c0" />
<img width="1239" alt="Screenshot 2025-06-24 at 14 45 10" src="https://github.com/user-attachments/assets/ebdf5a4d-c941-4f1e-911f-647aa156b37a" />
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Cluster.
Closes#3346 from FMX/b2045.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <ethanfeng@apache.org>
Co-authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
Support testing of dynamic configuration management cli.
### Why are the changes needed?
The tests of dynamic configuration management cli are disabled since dynamic conf is not enabled in unit tests, which should support testing dynamic configuration management cli.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`TestCelebornCliCommands`.
Closes#3340 from SteNicholas/CELEBORN-1056.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
Support upsert and delete of dynamic configuration management.
### Why are the changes needed?
There is only listing dynamic configuration interface for dynamic configuration management. It should support upserting and deleting dynamic configuration.
### Does this PR introduce _any_ user-facing change?
- Rest API:
- `/api/v1/conf/dynamic/upsert` to upsert dynamic configurations.
- `/api/v1/conf/dynamic/delete` to delete dynamic configurations.
- CLI:
- `--upsert-dynamic-conf` to upsert dynamic configurations.
- `--delete-dynamic-conf` to upsert dynamic configurations.
### How was this patch tested?
- `ConfigServiceSuiteJ`
- `ApiV1BaseResourceSuite`
- `TestCelebornCliCommands`
Closes#3323 from SteNicholas/CELEBORN-1056.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
This PR is part of [CIP17: Interruption Aware Slot Selection](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-17%3A+Interruption+Aware+Slot+Selection).
It introduces a REST api for external services to notify master about interruptions/schedules.
### Why are the changes needed?
To nofify master of upcoming interruption notices in the worker fleet. Master can then use these to proactively deprioritize workers that might be in scope for interruption sooner.
### Does this PR introduce _any_ user-facing change?
new rest api
### How was this patch tested?
added unit tests.
Closes#3285 from akpatnam25/CELEBORN-2014.
Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Bump spark 4.0 version to 4.0.0.
### Why are the changes needed?
Spark 4.0.0 is ready.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA.
Closes#3282 from turboFei/spark_4.0.
Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Updated the DefaultIdentityProvider to return tenantId, name based on configs provided by the user. Using the CelebornConf object used by LifeCycleManager to get the tenantId and name.
### Why are the changes needed?
The tenant id and username passed by the user were not being used because the DefaultIdentityProvider creates a new CelebornConf each time the provide function is called. Due to this, the tenantid and username were always coming as default. With these changes, we are using the CelebornConf object used by the LifeCycleManager which includes the configs provided by the user.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
Added UTs and tested on staging setup
Closes#3214 from AmandeepSingh285/CELEBORN-1966.
Authored-by: amandeeps.28 <amandeeps.28@uber.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Worker reports resourceConsumption to master
2. QuotaManager calculates the resourceConsumption of each app and marks the apps that exceed the quota.
2.1 When the tenant's resourceConsumption exceeds the tenant's quota, select the app with a larger consumption to mark interrupted.
2.2 When the resourceConsumption of the cluster exceeds the cluster quota, select the app with larger consumption to mark interrupted.
3. Master returns to Driver through heartbeat, whether app is marked interrupted
### Why are the changes needed?
The current storage quota logic can only limit new shuffles, and cannot limit the writing of existing shuffles. In our production environment, there is such an scenario: the cluster is small, but the user's app single shuffle is large which occupied disk resources, we want to interrupt those shuffle.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
UTs.
Closes#2819 from leixm/CELEBORN-1577-2.
Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Support to get the workers topology information with RESTful api.
1. return networkLocation in WorkerData
2. add new api `/api/v1/workers/topology` to return the grouped workers topology info.
### Why are the changes needed?
1. To get the workers topology information.
2. To know the rack awareness well.
### Does this PR introduce _any_ user-facing change?
No break change.
### How was this patch tested?
UT and IT.
<img width="1008" alt="image" src="https://github.com/user-attachments/assets/6cb1aa2a-1160-4570-acb1-7602e2ce0b09" />
<img width="719" alt="image" src="https://github.com/user-attachments/assets/d26c3326-4837-40ad-a344-3cb4204bf607" />
Closes#3112 from turboFei/topology.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Using Operation description instead of ApiResponse description for RESTful APIs.
### Why are the changes needed?
Make the API description in correct place.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Before:
<img width="1434" alt="image" src="https://github.com/user-attachments/assets/7a92c4a7-d550-4221-bee2-d52918719521" />
After:
<img width="1433" alt="image" src="https://github.com/user-attachments/assets/0287b425-7b56-4ef7-ba5d-4b53b4208780" />
Closes#3038 from turboFei/api_desc.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Minor fix the v1 RESTful apis before 0.6.0 release.
1. update the API description to use UPPER case worker EventType
2. `subResourceConsumption` => `subResourceConsumptions`.
### Why are the changes needed?
1. With https://github.com/apache/celeborn/pull/2754, the openapi-sdk works well. but for the RESTful call without SDK, the worker eventType is still case sensitive, might be caused by the jersey issue mentioned in https://github.com/eclipse-ee4j/jersey/issues/5288. So, In this PR, I change the description in the swagger for user guidance.
<img width="1524" alt="image" src="https://github.com/user-attachments/assets/70e4f239-dc36-47bc-902e-5340986f014a" />
2. rename `subResourceConsumption` to `subResourceConsumptions`.
### Does this PR introduce _any_ user-facing change?
No, the api has not been released.
### How was this patch tested?
GA.
Closes#3023 from turboFei/restful_minor_fix.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
1. rename the RPC metrics name from `${name}_${metric}` to `Rpc${metric}{name=$name}` so that it is easy to add into grafana dashboard
2. Use MASTER/WORKER/CLIENT Role for rpc env.
3. add the rpc metrics into grafana dashboard.
### Why are the changes needed?
For monitoring
### Does this PR introduce _any_ user-facing change?
No, it has not been released
### How was this patch tested?
UT for metrics source `instance`.
<img width="1456" alt="image" src="https://github.com/user-attachments/assets/90284390-54ad-49ef-a868-fa537d2301b8">
<img width="1880" alt="image" src="https://github.com/user-attachments/assets/e8101e47-d649-4c66-9978-1efb4faa047f">
Closes#2990 from turboFei/rpc_metrics.
Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
1. retry on BindException when starting master/worker http server
2. record the used ports and pre-check whether the selected port is used or bounded before binding
### Why are the changes needed?
To fix flaky test.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA.
Closes#2906 from turboFei/retry_master_suite.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Add histogram
2. Collect critical metrics about fetch chunk
### Why are the changes needed?
1. To find out IO pattern of fetch chunk
2. To have detail metrics about fetch chunk time
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster.
<img width="940" alt="截屏2024-12-09 15 42 50" src="https://github.com/user-attachments/assets/9f526103-c162-4607-a031-ba90f42ae83e">
<img width="962" alt="截屏2024-12-09 15 42 56" src="https://github.com/user-attachments/assets/c17822da-0433-4701-b0cc-0887ac970353">
Closes#2983 from FMX/b1766.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
1. report the `resourceConsumptionSnapshot` from `storageManager` directly in the worker heartbeat, which does not contain the empty user resource consumption
2. For RESTful API, do not return the empty user resource consumption as well.
### Why are the changes needed?
878a83cfa7/common/src/main/scala/org/apache/celeborn/common/meta/WorkerInfo.scala (L239-L248)
Currently, we never remove the user resource consumption even the sub resource consumptions is empty, and create a `ResourceConsumption(0, 0, 0, 0)` instead.
I am afraid that, the worker will report more and more empty user resource consumption to master, once one of their slots assigned to this worker.
Likes:
<img width="813" alt="image" src="https://github.com/user-attachments/assets/64932552-dc29-4a43-aed4-557419628b23">
So, I think we just need to report the `resourceConsumptionSnapshot` from `storageManager` directly, which does not contain the empty user resource consumption.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA.
Closes#2967 from turboFei/reduce_report.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
This information is useful for the automation tool integration.
For our automation tool, it query all the worker status periodically by calling master `/api/v1/worker`.
In decommission process, when the worker is in IDLE state, we need to check whether there is still unreleased shuffle data on this worker so that we can shutdown this node without user impaction.
Before, I have to call the worker `/ap1/v1/shuffles` to check that.
It is better that we can get all the information from celeborn master end, because master is HA enabled and always reachable.
So in this PR, it returns the struct resource consumption for automation tool integration.
### Does this PR introduce _any_ user-facing change?
No, this RESTful api has not been released.
### How was this patch tested?
GA.
Closes#2955 from turboFei/worker_info_object.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
Remove the code for app top disk usage both in master and worker end.
Prefer to use below prometheus expr to figure out the top app usages.
```
topk(50, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId!=""}))
```
### Why are the changes needed?
To address comments: https://github.com/apache/celeborn/pull/2947#issuecomment-2499564978
> Due to the application dimension resource consumption, this feature should be included in the deprecated features. Maybe you can remove the codes for application top disk usage.
### Does this PR introduce _any_ user-facing change?
Yes, remove the app top disk usage api.
### How was this patch tested?
GA.
Closes#2949 from turboFei/remove_app_top_usage.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support predefined tags expression for tenant and users via dynamic config. Using this admin can configure tags for users/tenants and give permission to special users to provide custom tags expression.
### Why are the changes needed?
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-11+Supporting+Tags+in+Celeborn
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
UTs
Closes#2936 from s0nskar/admin_tags.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
as title
### Why are the changes needed?
AWS S3 doesn't support append, so Celeborn had to copy the historical data from s3 to worker and write to s3 again, which heavily scales out the write. This PR implements a better solution via MPU to avoid copy-and-write.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?

I conducted an experiment with a 1GB input dataset to compare the performance of Celeborn using only S3 storage versus using SSD storage. The results showed that Celeborn with SSD storage was approximately three times faster than with only S3 storage.
<img width="1728" alt="Screenshot 2024-11-16 at 13 02 10" src="https://github.com/user-attachments/assets/8f879c47-c01a-4004-9eae-1c266c1f3ef2">
The above screenshot is the second test with 5000 mapper and reducer that I did.
Closes#2830 from zhaohehuhu/dev-1021.
Lead-authored-by: zhaohehuhu <luoyedeyi@163.com>
Co-authored-by: He Zhao <luoyedeyi459@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding support for worker tags in DBConfigService.
### Why are the changes needed?
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-11+Supporting+Tags+in+Celeborn
### Does this PR introduce _any_ user-facing change?
For the users using DBConfigService they can upgrade their DB using by following the README present in PR.
```
% mysql --verbose
mysql> use <configstore_db_name>;
Database changed
mysql> source upgrade-0.5.0-to-0.6.0-mysql.sql
```
### How was this patch tested?
- UTs
- Verified the DB scripts on local
Closes#2925 from s0nskar/db_config.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Audit all RESTful api calls and use separate restAuditFile.
### Why are the changes needed?
Audit all the api calls and use the separate log file to easy check the audit log.
### Does this PR introduce _any_ user-facing change?
No, this feature has not been released.
### How was this patch tested?
```
build/sbt "clean;celeborn-master/testOnly *ApiV1MasterResourceSuite"
```
<img width="1714" alt="image" src="https://github.com/user-attachments/assets/7b94fd89-005b-4f48-ab24-cc4ae7f473e5">
Closes#2895 from turboFei/rest_audit_log4j.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
1. `GET /api/v1/applications/deleteApps` -> `DELETE /api/v1/applications`
2. `GET /api/v1/applications/reviseLostShuffles` -> `POST /api/v1/applications/revise_lost_shuffles`
### Why are the changes needed?
Followup for https://github.com/apache/celeborn/pull/2746
### Does this PR introduce _any_ user-facing change?
No, these APIs has not been released yet.
### How was this patch tested?
GA.
Closes#2892 from turboFei/delete_app.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve `ThreadStackTrace` with `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` for thread dump.
### Why are the changes needed?
ThreadStackTrace does not support stack trace including `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` at present. It's recommend to improve `ThreadStackTrace` of thread dump for more details of thread stack trace.
### Does this PR introduce _any_ user-facing change?
The response of `ThreadStack` in `/api/v1/thread_dump` adds `synchronizers`, `monitors`, `lockName`, `lockOwnerName`, `suspended`, `inNative` fields.
Cherry pick:
- https://github.com/apache/spark/pull/42575
- https://github.com/apache/spark/pull/43095
### How was this patch tested?
`ApiV1BaseResourceSuite#thread_dump`
Closes#2888 from SteNicholas/CELEBORN-1697.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
1. remove scala binary version from the openapi-client artifactId.
2. skip openapi-client doc compile, it was missed in https://github.com/apache/celeborn/pull/2641
### Why are the changes needed?
Because the openapi-client is a pure java module.
### Does this PR introduce _any_ user-facing change?
No, it has not been released.
### How was this patch tested?
GA.
Closes#2861 from turboFei/remove_Scala.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
- Added a reset method for DynamicConfigServiceFactory
- Cleaned up QuotaManagerSuite
### Why are the changes needed?
Without this change we can not initialize new configService in any other tests.
Ex: test for this PR https://github.com/apache/celeborn/pull/2844 are failing because of this issue.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA
Closes#2848 from s0nskar/fix_quotatest.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
To support revising lost shuffle IDs in a long-running job such as flink batch jobs.
### Why are the changes needed?
1. To support revise lost shuffles.
2. To add an HTTP endpoint to revise lost shuffles manually.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Cluster tests.
Closes#2746 from FMX/b1600.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
CongestionController support dynamic config
### Why are the changes needed?
Currently, Celeborn only supports quota management based on disk file bytes/count, and this quota management cannot cope with sudden increases in traffic, which will cause corrupt to the cluster.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
UT.
Closes#2817 from leixm/CELEBORN-1487-2.
Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding REST api and cli for container info. User can configure this api to be based on whichever cluster manager they are using.
### Why are the changes needed?
see above
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
added UTs
Closes#2758 from akpatnam25/CELEBORN-1599.
Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding support for reading worker tags in FSConfigService. (Will handle DBConfigService in separate PR). Tags will be part of System level config and dynamic config file structure with tags is shown below:
```
- level: SYSTEM
config:
celeborn.test.int.only: 100
tags:
tag1:
- 'host1:1111'
- 'host2:2222'
tag2:
- 'host3:3333'
- 'host4:4444'
```
### Why are the changes needed?
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-11+Supporting+Tags+in+Celeborn
### Does this PR introduce _any_ user-facing change?
- Changes are backward compatible.
- User will be able to pass worker tags from dynamicConfig.yaml file.
### How was this patch tested?
UTs
Closes#2766 from s0nskar/tags_config.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
1. add `instanceLabel` in metrics source, prefer `FQDN:port` than `ip:port` even with `celeborn.network.bind.preferIpAddress=false` before
2. add variable `instance` with `label_values(metrics_JVMCPUTime_Value, instance)` same as `celeborn-jvm-dashboard.json`
3. add filter `instance=~"${instance}"` for every metrics
4. add missing `legendFormat` for memory file storage metrics expressions
### Why are the changes needed?
There should be too many celeborn instances in production use case, it is better to add filter with instance.
### Does this PR introduce _any_ user-facing change?
Yes. introduce new variable.
But the instance default value is `ALL`, same behavior as before.
### How was this patch tested?
Config: `celeborn.network.bind.preferIpAddress=false`
<img width="1141" alt="image" src="https://github.com/user-attachments/assets/c3161069-790a-4cb2-8654-6d52cf8e5fb0">
<img width="944" alt="image" src="https://github.com/user-attachments/assets/293b8bd4-252a-459c-aa86-5f4aa75eb594">
<img width="939" alt="image" src="https://github.com/user-attachments/assets/1e1b28af-dd71-4c5b-8285-57473a6c9650">
For JVM metrics, before it was ip:port, and now it is FQDN:port.
<img width="947" alt="image" src="https://github.com/user-attachments/assets/fe00762f-605d-4b5e-b0a4-c586bdc0ec1a">
Closes#2777 from turboFei/legend_base.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Start the http server after all handlers added.
### Why are the changes needed?
Do not expose the RESTful server before all handlers initialized.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UT.
Closes#2764 from turboFei/metrics_not_found.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Support SSL for celeborn RESTful service.
### Why are the changes needed?
For HTTP SSL connection requirements.
### Does this PR introduce _any_ user-facing change?
No, SSL is disabled by defaults.
### How was this patch tested?
Integration testing.
```
celeborn.master.http.ssl.enabled=true
celeborn.master.http.ssl.keystore.path=/hadoop/keystore.jks
celeborn.master.http.ssl.keystore.password=xxxxxxx
```
<img width="1143" alt="image" src="https://github.com/user-attachments/assets/2334561d-1de3-4b38-bc80-5d5d86d3b8ff">
<img width="695" alt="image" src="https://github.com/user-attachments/assets/e3877468-cc3b-4a4a-bf75-2994f557a104">
Closes#2756 from turboFei/HADP_1609_ssl2.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Enable `useEnumCaseInsensitive` for openapi-generator.
And then in celeborn server end, the enum will be mapped to celeborn internal WorkerEventType.
### Why are the changes needed?
I met exception when sending worker event with openapi sdk.
```
Exception in thread "main" ApiException{code=400, responseHeaders={Server=[Jetty(9.4.52.v20230823)], Content-Length=[491], Date=[Fri, 20 Sep 2024 23:50:27 GMT], Content-Type=[text/plain]}, responseBody='Cannot deserialize value of type `org.apache.celeborn.rest.v1.model.SendWorkerEventRequest$EventTypeEnum` from String "DecommissionThenIdle": not one of the values accepted for Enum class: [DECOMMISSION_THEN_IDLE, GRACEFUL, NONE, DECOMMISSION, IMMEDIATELY, RECOMMISSION]
at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 14] (through reference chain: org.apache.celeborn.rest.v1.model.SendWorkerEventRequest["eventType"])'}
at org.apache.celeborn.rest.v1.master.invoker.ApiClient.processResponse(ApiClient.java:913)
at org.apache.celeborn.rest.v1.master.invoker.ApiClient.invokeAPI(ApiClient.java:1000)
at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:378)
at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:334)
at org.example.Main.main(Main.java:22)
```
The testing code to re-produce:
```
package org.example;
import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;
import org.apache.celeborn.rest.v1.model.ExcludeWorkerRequest;
import org.apache.celeborn.rest.v1.model.SendWorkerEventRequest;
import org.apache.celeborn.rest.v1.model.WorkerId;
public class Main {
public static void main(String[] args) throws Exception {
String cmUrl = "http://localhost:9098";
WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
workerApi.excludeWorker(new ExcludeWorkerRequest()
.addAddItem(new WorkerId()
.host("localhost")
.rpcPort(1)
.pushPort(2)
.fetchPort(3)
.replicatePort(4)));
workerApi.sendWorkerEvent(new SendWorkerEventRequest()
.addWorkersItem(new WorkerId()
.host("127.0.0.1")
.rpcPort(56116)
.pushPort(56117)
.fetchPort(56119)
.replicatePort(56118))
.eventType(SendWorkerEventRequest.EventTypeEnum.DECOMMISSION_THEN_IDLE));
}
}
```
Seems because for the EventTypeEnum, the name and value not the same and then cause this issue.
Not sure why the UT passed, but the integration testing failed.
For EventTypeEnum, because its value is case sensitive, so we meet this issue.
8734d16638/openapi/openapi-client/src/main/java/org/apache/celeborn/rest/v1/model/SendWorkerEventRequest.java (L47-L83)
Related issue in jersey end I think, https://github.com/eclipse-ee4j/jersey/issues/5288
In this PR, `useEnumCaseInsensitive` is enabled for openapi-generator.
### Does this PR introduce _any_ user-facing change?
No, there is not user facing change and this SDK has not been released yet.
### How was this patch tested?
Existing UT and Integration testing.
<img width="1265" alt="image" src="https://github.com/user-attachments/assets/6a34a0dd-c474-4e8d-b372-19b0fda94972">
Closes#2754 from turboFei/eventTypeEnumMapping.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
Server module missing checks.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA
Closes#2742 from cxzl25/check_server_deps.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix the unique key to reflect correct columns names.
### Why are the changes needed?
Running current DB scripts give below error because `user` column was renamed to `name` (https://github.com/apache/celeborn/pull/2340) but the unique key was not updated correctly.
```
mysql> CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config
-> (
-> id int NOT NULL AUTO_INCREMENT,
-> cluster_id int NOT NULL,
-> tenant_id varchar(255) NOT NULL,
-> level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
-> name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
-> config_key varchar(255) NOT NULL,
-> config_value varchar(255) NOT NULL,
-> type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
-> gmt_create timestamp NOT NULL,
-> gmt_modify timestamp NOT NULL,
-> PRIMARY KEY (id),
-> UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `user`, `config_key`)
-> );
ERROR 1072 (42000): Key column 'user' doesn't exist in table
```
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
Tested in local DB
```
mysql> CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config
-> (
-> id int NOT NULL AUTO_INCREMENT,
-> cluster_id int NOT NULL,
-> tenant_id varchar(255) NOT NULL,
-> level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
-> name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
-> config_key varchar(255) NOT NULL,
-> config_value varchar(255) NOT NULL,
-> type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
-> gmt_create timestamp NOT NULL,
-> gmt_modify timestamp NOT NULL,
-> PRIMARY KEY (id),
-> UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `name`, `config_key`)
-> );
Query OK, 0 rows affected (0.01 sec)
```
Closes#2740 from s0nskar/fix-db-script.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
1. seems the quota.yaml is outdated, remove it
2. the config item set in dynamicConfig.yaml.template is not dynamic config, remove the one
3. prevent NPE when loading dynamicConfig.yaml
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UT.
Closes#2737 from turboFei/dynamic_config.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Support wildcard bind for RPC and HTTP servers. When wildcard address is used, the service is able to listen to both ipv4 and ipv6 traffic in dual-stack environments.
The specific scenario where this becomes relevant is as follows:
If some of the compute infrastructure is IPv4 only, some v6 only and others dual stack - the way we can have a single Celeborn infra to cater to all is by:
a) Set bind.preferip to false - so that advertised address is the host and not IP.
b) bind to wild card address
With both in place, the v4 only compute nodes will resolve the v4 address and connect to v4 ip/port.
Likewise, for v6 only.
Dual stack compute nodes will use prefer ipv6 Java flag to resolve to either v4 or v6.
This is how we are handling the combination of scenarios internally.
### Why are the changes needed?
see above.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Tested on a server using netstat, and tried connecting to via `nc -4` and `nc -6` to ensure connection was there.
Closes#2713 from akpatnam25/CELEBORN-1513-fix.
Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Using the request base url as swagger server, to prevent the swagger server not reachable and `CORS` error if the swagger server urls do not match.
Currently, if the http host is bound to local, the swagger server is not reachable.
For example:
```
celeborn.master.http.host=0.0.0.0
celeborn.worker.http.host=0.0.0.0
```
### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No, just use the request base url as swagger server.
### How was this patch tested?
Integration testing:
<img width="1483" alt="image" src="https://github.com/user-attachments/assets/f8465bcb-a266-4532-9f11-52e9374c56a5">
<img width="1423" alt="image" src="https://github.com/user-attachments/assets/84eacd86-7ba3-4f14-907b-6d529c6e0e28">
Closes#2674 from turboFei/swagger_server.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
This PR aims to introduce `warn-unused-import` in Scala.
### Why are the changes needed?
There are currently many invalid imports, which can be checked using `-Ywarn-unused-import`.
And through `silencer` plugin we can avoid some imports required in scala 2.11.
```scala
import org.apache.celeborn.common.util.FunctionConverter._
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA
Closes#2689 from cxzl25/CELEBORN-1565.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shaoyun Chen <csy@apache.org>
### What changes were proposed in this pull request?
Introduce `ConfigStore` to support `celeborn.dynamicConfig.store.backend` with short name and backend implementation.
### Why are the changes needed?
`celeborn.dynamicConfig.store.backend` is allowed to be specified in two ways:
- Using short names: Default available options are FS, DB.
- Using the fully qualified class name of the backend implementation.
Therefore, it's recommended to introduce `ConfigStore` based on SPI mechanism for `celeborn.dynamicConfig.store.backend` instead of `dynamicConfigStoreBackendShortNames` which could not add other short name of backend implementation for users.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2698 from SteNicholas/CELEBORN-1550.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding support of providing custom dynamic store backend implementation, users can now pass there own implementation for dynamic config store backend.
This change also keep the backwards compatibility of supporting short names for backend like "FS" and "DB"
### Why are the changes needed?
Currently celeborn only supports File and DB based backend while there can be other ways of managing these configs.
### Does this PR introduce _any_ user-facing change?
NO, user facing behaviour will be same.
### How was this patch tested?
Existing UTs verifies that this change is working for "FS" and "DB" implementation.
Closes#2670 from s0nskar/dynamic_config.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
This PS is a followup of CELEBORN-1521, move the BasicPrincipal in to celeborn-spi module. so that customer do not need to implement it by themselves.
### Why are the changes needed?
For authentication extension.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing GA.
Closes#2665 from turboFei/spi_follow.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
I am working on the bearer token authentication integration, and meet the token base64 decode issue.
And found that, for bear token, we shall authenticate it directly.

### Why are the changes needed?
For bearer authentication issue.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Integration testing.
<img width="1727" alt="image" src="https://github.com/user-attachments/assets/0c03b73b-be08-45b0-81c4-006eebc5ac3b">
Closes#2666 from turboFei/bear_auth.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
We used `jersey2` library for celeborn-openapi-client before, and I found that there is dependencies lack issue for shaded celeborn-openapi-client.
I tried to raise a [PR #2640] to fix it, but seems It is difficult to maintain the dependencies transition from jersey dependencies.
And I received the suggestion from pan to migrate the library from jersey2 to `apache-httpclient`.
FYI: for https://openapi-generator.tech/docs/generators/java/
<img width="500" alt="image" src="https://github.com/user-attachments/assets/d102a7c9-46cd-4fd7-a2a0-7396a815776d">
To leverage the latest openapi-generator plugin, I upgrade the openapi-generator version to latest 7.7.0 and it requires JDK11+.
Due celeborn does not drop the Java8 support so far, so I include the generated code into repo and add user guide for re-generation.
### Why are the changes needed?
To fix dependencies leak issue and maintain the dependencies easily.
### Does this PR introduce _any_ user-facing change?
No, this SDK has not been released, so no user-facing change.
### How was this patch tested?
Testing with sample maven project.
pom.xml:
```
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>test_openapi</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.celeborn</groupId>
<artifactId>celeborn-openapi-client_2.12</artifactId>
<version>0.6.0-SNAPSHOT</version>
</dependency>
</dependencies>
</project>
```
Testing code:
```
package org.example;
import org.apache.celeborn.rest.v1.master.MasterApi;
import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;
public class Main {
public static void main(String[] args) throws Exception {
String cmUrl = "http://***:9098";
MasterApi masterApi = new MasterApi(new ApiClient().setBasePath(cmUrl));
System.out.println(masterApi.getMasterGroupInfo().getLeader().getAddress().split(":")[0]);
WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
System.out.println(workerApi.getWorkers());
System.out.println(workerApi.getWorkerEvents());
}
}
```
```
java -Dfile.encoding=UTF-8 -classpath /Users/fwang12/todo/test_openapi/target/classes:/Users/fwang12/todo/celeborn/openapi/openapi-client/target/celeborn-openapi-client_2.12-0.6.0-SNAPSHOT.jar org.example.Main
```
<img width="1727" alt="image" src="https://github.com/user-attachments/assets/2da8b126-be96-4c37-9a33-ba196024f2ba">
Closes#2641 from turboFei/appache_httpclient.
Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce celeborn-spi module for authentication extensions.
### Why are the changes needed?
Address comments: https://github.com/apache/celeborn/pull/2632#issuecomment-2247132115
### Does this PR introduce _any_ user-facing change?
No, this interface has not been released.
### How was this patch tested?
UT.
Closes#2644 from turboFei/celeborn_spi.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
- Fix `WARNING` of error prone.
- Disable `EmptyCatch`, `JdkObsolete`, `MutableConstantField` and `UnnecessaryParentheses`.
### Why are the changes needed?
There are many `WARNING` generated by error prone. We should follow the suggestion of error prone to fix `WARNING`.
```
$ mvn clean install -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/SaslUtils.java:[44,25] [MutableConstantField] Constant field declarations should use the immutable type (such as ImmutableList) instead of the general collection interface type (such as List)
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/SaslUtils.java:[47,18] [MutableConstantField] Constant field declarations should use the immutable type (such as ImmutableList) instead of the general collection interface type (such as List)
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/client/TransportClientBootstrap.java:[34,5] [InvalidParam] Parameter name `channel` is unknown.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java:[96,29] [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java:[104,30] [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/anonymous/AnonymousSaslServerFactory.java:[67,2] [ClassCanBeStatic] Inner class is non-static but does not reference enclosing class
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/meta/FileInfo.java:[60,17] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/util/TransportFrameDecoder.java:[54,46] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManager.java:[207,29] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManager.java:[216,28] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/anonymous/AnonymousSaslClientFactory.java:[73,2] [ClassCanBeStatic] Inner class is non-static but does not reference enclosing class
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/anonymous/AnonymousSaslClientFactory.java:[93,31] [DefaultCharset] Implicit use of the platform default charset, which can result in differing behaviour between JVM executions or incorrect behavior if the encoding of the data source doesn't match expectations.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/util/ExceptionUtils.java:[65,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/util/ExceptionUtils.java:[66,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/SslSampleConfigs.java:[164,16] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/SslSampleConfigs.java:[165,14] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/SslSampleConfigs.java:[165,35] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/SSLTransportClientFactorySuiteJ.java:[32,14] [MissingOverride] setUp overrides method in TransportClientFactorySuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/SSLTransportClientFactorySuiteJ.java:[40,14] [MissingOverride] tearDown overrides method in TransportClientFactorySuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/protocol/EncryptedMessageWithHeaderSuiteJ.java:[124,6] [UseCorrectAssertInTests] Java assert is used in test. For testing purposes Assert.* matchers should be used.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/RpcIntegrationSuiteJ.java:[255,15] [UnusedMethod] Private method 'assertErrorAndClosed' is never used.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/RpcIntegrationSuiteJ.java:[154,17] [UnusedNestedClass] This nested class is unused, and can be removed.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/RpcIntegrationSuiteJ.java:[57,15] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManagerSuiteJ.java:[107,10] [AssertThrowsMultipleStatements] The lambda passed to assertThrows should contain exactly one statement
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManagerSuiteJ.java:[134,10] [AssertThrowsMultipleStatements] The lambda passed to assertThrows should contain exactly one statement
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/read/LocalPartitionReader.java:[84,31] [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[130,6] [ThreadLocalUsage] ThreadLocals should be stored in static fields
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[714,6] [MissingCasesInEnumSwitch] Non-exhaustive switch; either add a default or handle the remaining cases: SUCCESS, PARTIAL_SUCCESS, REQUEST_FAILED, and 43 others
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1609,10] [MissingCasesInEnumSwitch] Non-exhaustive switch; either add a default or handle the remaining cases: PARTIAL_SUCCESS, REQUEST_FAILED, SHUFFLE_ALREADY_REGISTERED, and 45 others
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1648,26] [MissingOverride] updateFileGroup implements method in ShuffleClient; expected Override
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1654,57] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1823,32] [MissingOverride] getDataClientFactory implements method in ShuffleClient; expected Override
[WARNING] /Users/nicholas/Github/celeborn/client/src/test/java/org/apache/celeborn/client/ShuffleClientSuiteJ.java:[185,6] [UseCorrectAssertInTests] Java assert is used in test. For testing purposes Assert.* matchers should be used.
[WARNING] /Users/nicholas/Github/celeborn/service/src/main/java/org/apache/celeborn/server/common/service/store/db/DbServiceManagerImpl.java:[70,33] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/service/src/main/java/org/apache/celeborn/server/common/service/store/db/DbServiceManagerImpl.java:[71,33] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HARaftServer.java:[424,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HARaftServer.java:[425,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HARaftServer.java:[496,55] [UnescapedEntity] This looks like a type with type parameters. The < and > characters here will be interpreted as HTML, which can be avoided by wrapping it in a {code } tag.
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/SingleMasterMetaManager.java:[166,14] [MissingOverride] handleUpdatePartitionSize implements method in IMetadataHandler; expected Override
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/SlotsAllocator.java:[298,61] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/MapPartitionDataReader.java:[346,37] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[202,33] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[300,31] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[497,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[503,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[513,39] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/CreditStreamManager.java:[256,12] [ClassCanBeStatic] Inner class is non-static but does not reference enclosing class
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/WorkerSecretRegistryImpl.java:[73,12] [CacheLoaderNull] The result of CacheLoader#load must be non-null.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[69,13] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[73,13] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[103,24] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[104,39] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/MapPartitionDataWriter.java:[261,46] [ByteBufferBackingArray] ByteBuffer.array() shouldn't be called unless ByteBuffer.arrayOffset() is used or if the ByteBuffer was initialized using ByteBuffer.wrap() or ByteBuffer.allocate().
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ChunkStreamManager.java:[102,40] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ChunkStreamManager.java:[109,40] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java:[318,39] [IntLongMath] Expression of type int may overflow before being assigned to a long
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/FetchHandlerSuiteJ.java:[133,6] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/network/SSLRequestTimeoutIntegrationSuiteJ.java:[32,14] [MissingOverride] setUp overrides method in RequestTimeoutIntegrationSuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/network/SSLRequestTimeoutIntegrationSuiteJ.java:[40,14] [MissingOverride] tearDown overrides method in RequestTimeoutIntegrationSuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/ChunkFetchIntegrationSuiteJ.java:[74,15] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/ChunkFetchIntegrationSuiteJ.java:[186,47] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/SSLReducePartitionDataWriterSuiteJ.java:[30,26] [MissingOverride] createModuleTransportConf overrides method in DiskReducePartitionDataWriterSuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/local/DiskReducePartitionDataWriterSuiteJ.java:[234,47] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/memory/MemoryReducePartitionDataWriterSuiteJ.java:[198,47] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
```
```
$ mvn clean install -Pspark-2.4 -pl client-spark/common,client-spark/spark-2 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SortBasedPusher.java:[109,57] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SendBufferPool.java:[56,14] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SendBufferPool.java:[57,21] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java:[247,14] [UnusedMethod] Private method 'executorCores' is never used.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java:[120,55] [ReferenceEquality] Comparison using reference equality instead of value equality
```
```
$ mvn clean install -Pspark-3.5 -pl client-spark/spark-3 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/CelebornShuffleDataIO.java:[65,17] [MissingOverride] supportsReliableStorage implements method in ShuffleDriverComponents; expected Override
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java:[163,55] [ReferenceEquality] Comparison using reference equality instead of value equality
```
```
$ mvn clean install -Pflink-1.14 -pl client-flink/common,client-flink/flink-1.14 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/readclient/CelebornBufferStream.java:[161,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/readclient/CelebornBufferStream.java:[223,27] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/ShuffleTaskInfo.java:[46,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[99,66] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[236,21] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[251,19] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[267,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[354,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[392,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[473,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[533,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/buffer/TransferBufferPool.java:[182,33] [MixedMutabilityReturnType] This method returns both mutable and immutable collections or maps from different paths. This may be confusing for users of the method.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/utils/FlinkUtils.java:[34,6] [DoubleBraceInitialization] Prefer collection factory methods or builders to the double-brace initialization pattern.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/test/java/org/apache/celeborn/plugin/flink/BufferPackSuiteJ.java:[207,6] [CatchAndPrintStackTrace] Logging or rethrowing exceptions should usually be preferred to catching and calling printStackTrace
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.14/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.15 -pl client-flink/flink-1.15 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.15/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.17 -pl client-flink/flink-1.16 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
```
```
$ mvn clean install -Pflink-1.17 -pl client-flink/flink-1.17 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.17/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.18 -pl client-flink/flink-1.18 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.18/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.19 -pl client-flink/flink-1.19 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.19/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pmr -pl client-mr/mr -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
```
$ mvn clean install -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pspark-2.4 -pl client-spark/common,client-spark/spark-2 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pspark-3.5 -pl client-spark/spark-3 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.14 -pl client-flink/common,client-flink/flink-1.14 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.15 -pl client-flink/flink-1.15 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.16 -pl client-flink/flink-1.15 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.17 -pl client-flink/flink-1.17 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.18 -pl client-flink/flink-1.18 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.19 -pl client-flink/flink-1.19 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pmr -pl client-mr/mr -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
```
Closes#2555 from SteNicholas/CELEBORN-1190.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
### What changes were proposed in this pull request?
as title
### Why are the changes needed?
Now, Celeborn doesn't support sinking shuffle data directly to Amazon S3, which could be a limitation when we're trying to move on-premises servers to AWS and use S3 as a data sink for shuffled data.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closes#2579 from zhaohehuhu/dev-0619.
Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Fix below api response:
- master GET /api/v1/masters
- master GET /api/v1/applications/top_disk_usages
- master&worker /api/v1/thread_dump
2. Fix typo in migration guide
3. refine the api annotation: METHOD -> PATH
4. enhance the `RestExceptionMapper`
### Why are the changes needed?
For /api/v1/masters, the `id` field is not in good format.
```
{
"groupId": "c5196f6d-2c34-3ed3-8b8a-47bede733167",
"leader": {
"id": "<ByteString4960c29e size=1 contents=\"0\">",
"address": "...:9872"
},
...
}
```
For `/api/v1/applications/top_disk_usages`, it thrown NPE, we shall filter the null items.
```
24/07/18 21:52:38,506 WARN [master-JettyThreadPool-40] RestExceptionMapper: Error occurs on accessing REST API.
java.lang.NullPointerException
at org.apache.celeborn.service.deploy.master.http.api.v1.ApplicationResource.$anonfun$topDiskUsedApplications$2(ApplicationResource.scala:78)
```
For `api/v1/thread_dump`, seems need to add `Produces(Array(MediaType.APPLICATION_JSON))`:
```
Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:65)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1116)
at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:649)
at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:380)
at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:426)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:264)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
... 36 more
Caused by: org.glassfish.jersey.message.internal.MessageBodyProviderNotFoundException: MessageBodyWriter not found for media type=text/html, type=class scala.collection.immutable.Map$Map1, genericType=class scala.collection.immutable.Map$Map1.
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:224)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:85)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:61)
... 51 more
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Integration testing.
For `api/v1/masters`:
<img width="824" alt="image" src="https://github.com/user-attachments/assets/c0908d05-aebc-435a-8446-038dd18fb7cd">
For master `api/v1/applications/top_disk_usages`:
<img width="559" alt="image" src="https://github.com/user-attachments/assets/50860735-9975-449a-9f77-24d8eafd2018">
For `api/v1/thread_dump`:
<img width="1188" alt="image" src="https://github.com/user-attachments/assets/9844de22-45c6-46ba-9260-c8a7d28c2e1d">
Closes#2637 from turboFei/fix_id_info.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>