Commit Graph

955 Commits

Author SHA1 Message Date
Xianming Lei
1d44e5fbfe [CELEBORN-1487][PHASE1] CongestionController support control traffic by user/worker traffic speed
### What changes were proposed in this pull request?
Introduce support control traffic by user/worker traffic speed.

### Why are the changes needed?
Currently, Celeborn only supports quota management based on disk file bytes/count, and this quota management cannot cope with sudden increases in traffic, which will cause corrupt to the cluster.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
UTs.

Closes #2797 from leixm/issue_1487_1.

Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-12 10:17:33 +08:00
mingji
f871c9861c
[CELEBORN-1637] Enhance config to bypass memory check for partition file sorter
### What changes were proposed in this pull request?
Add a config to bypass memory check when sorting shuffle files.

### Why are the changes needed?
If a celeborn worker has quite a large memory and it supports both Spark and Flink engines. This config should be enabled.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Cluster test.

Closes #2798 from FMX/b1637.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-10-11 19:41:40 +08:00
SteNicholas
b3bf68edc7
[CELEBORN-474][FOLLOWUP] Use JavaUtils#newConcurrentHashMap to speed up ConcurrentHashMap#computeIfAbsent
### What changes were proposed in this pull request?

Use `JavaUtils#newConcurrentHashMap` to speed up `ConcurrentHashMap#computeIfAbsent`.

Follow up #1383.

### Why are the changes needed?

Celeborn supports JDK8, which could meet the bug mentioned in [JDK-8161372](https://bugs.openjdk.org/browse/JDK-8161372). Therefore, it's better to use `JavaUtils#newConcurrentHashMap` to speed up `ConcurrentHashMap#computeIfAbsent`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #2796 from SteNicholas/CELEBORN-474.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-09 16:49:27 +08:00
Wang, Fei
c3d33daabc [CELEBORN-1627] Introduce instance variable for celeborn dashboard to filter metrics
### What changes were proposed in this pull request?

1. add `instanceLabel` in metrics source, prefer `FQDN:port` than `ip:port` even with `celeborn.network.bind.preferIpAddress=false` before
2. add variable  `instance` with  `label_values(metrics_JVMCPUTime_Value, instance)` same as `celeborn-jvm-dashboard.json`
3. add filter `instance=~"${instance}"` for every metrics
4. add missing `legendFormat` for memory file storage metrics expressions

### Why are the changes needed?

There should be too many celeborn instances in production use case, it is better to add filter with instance.

### Does this PR introduce _any_ user-facing change?
Yes. introduce new variable.

But the instance default value is `ALL`, same behavior as before.

### How was this patch tested?

Config: `celeborn.network.bind.preferIpAddress=false`
<img width="1141" alt="image" src="https://github.com/user-attachments/assets/c3161069-790a-4cb2-8654-6d52cf8e5fb0">
<img width="944" alt="image" src="https://github.com/user-attachments/assets/293b8bd4-252a-459c-aa86-5f4aa75eb594">

<img width="939" alt="image" src="https://github.com/user-attachments/assets/1e1b28af-dd71-4c5b-8285-57473a6c9650">

For JVM metrics, before it was ip:port, and now it is FQDN:port.
<img width="947" alt="image" src="https://github.com/user-attachments/assets/fe00762f-605d-4b5e-b0a4-c586bdc0ec1a">

Closes #2777 from turboFei/legend_base.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-10-09 14:47:03 +08:00
szt
6629be858b [CELEBORN-1574] Speed up unregister shuffle by batch processing
### What changes were proposed in this pull request?
In order to speed up the resource releasing,this PR Unregister shuffle in batch;

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT & Local cluster testing

Closes #2701 from zaynt4606/batchUnregister.

Lead-authored-by: szt <zaynt4606@163.com>
Co-authored-by: Zaynt <shuaizhentao.szt@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-10-08 14:03:41 +08:00
Wang, Fei
b2aa359d91 [CELEBORN-1609] Support SSL for celeborn RESTful service
### What changes were proposed in this pull request?
Support SSL for celeborn RESTful service.

### Why are the changes needed?
For HTTP SSL connection requirements.

### Does this PR introduce _any_ user-facing change?
No, SSL is disabled by defaults.

### How was this patch tested?

Integration testing.

```
celeborn.master.http.ssl.enabled=true
celeborn.master.http.ssl.keystore.path=/hadoop/keystore.jks
celeborn.master.http.ssl.keystore.password=xxxxxxx
```
<img width="1143" alt="image" src="https://github.com/user-attachments/assets/2334561d-1de3-4b38-bc80-5d5d86d3b8ff">

<img width="695" alt="image" src="https://github.com/user-attachments/assets/e3877468-cc3b-4a4a-bf75-2994f557a104">

Closes #2756 from turboFei/HADP_1609_ssl2.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-09-25 10:30:10 +08:00
Wang, Fei
1596cffb39 [CELEBORN-1607] Enable useEnumCaseInsensitive for openapi-generator
### What changes were proposed in this pull request?
Enable `useEnumCaseInsensitive` for openapi-generator.
And then in celeborn server end, the enum will be mapped to celeborn internal WorkerEventType.

### Why are the changes needed?

I met exception when sending worker event with openapi sdk.
```
Exception in thread "main" ApiException{code=400, responseHeaders={Server=[Jetty(9.4.52.v20230823)], Content-Length=[491], Date=[Fri, 20 Sep 2024 23:50:27 GMT], Content-Type=[text/plain]}, responseBody='Cannot deserialize value of type `org.apache.celeborn.rest.v1.model.SendWorkerEventRequest$EventTypeEnum` from String "DecommissionThenIdle": not one of the values accepted for Enum class: [DECOMMISSION_THEN_IDLE, GRACEFUL, NONE, DECOMMISSION, IMMEDIATELY, RECOMMISSION]
 at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 14] (through reference chain: org.apache.celeborn.rest.v1.model.SendWorkerEventRequest["eventType"])'}
    at org.apache.celeborn.rest.v1.master.invoker.ApiClient.processResponse(ApiClient.java:913)
    at org.apache.celeborn.rest.v1.master.invoker.ApiClient.invokeAPI(ApiClient.java:1000)
    at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:378)
    at org.apache.celeborn.rest.v1.master.WorkerApi.sendWorkerEvent(WorkerApi.java:334)
    at org.example.Main.main(Main.java:22)

```

The testing code to re-produce:
```
package org.example;

import org.apache.celeborn.rest.v1.master.WorkerApi;
import org.apache.celeborn.rest.v1.master.invoker.ApiClient;
import org.apache.celeborn.rest.v1.model.ExcludeWorkerRequest;
import org.apache.celeborn.rest.v1.model.SendWorkerEventRequest;
import org.apache.celeborn.rest.v1.model.WorkerId;

public class Main {
    public static void main(String[] args) throws Exception {

        String cmUrl = "http://localhost:9098";
        WorkerApi workerApi = new WorkerApi(new ApiClient().setBasePath(cmUrl));
        workerApi.excludeWorker(new ExcludeWorkerRequest()
                .addAddItem(new WorkerId()
                        .host("localhost")
                        .rpcPort(1)
                        .pushPort(2)
                        .fetchPort(3)
                        .replicatePort(4)));
        workerApi.sendWorkerEvent(new SendWorkerEventRequest()
                        .addWorkersItem(new WorkerId()
                                .host("127.0.0.1")
                                .rpcPort(56116)
                                .pushPort(56117)
                                .fetchPort(56119)
                                .replicatePort(56118))
                .eventType(SendWorkerEventRequest.EventTypeEnum.DECOMMISSION_THEN_IDLE));
    }
}
```

Seems because for the EventTypeEnum, the name and value not the same and then cause this issue.

Not sure why the UT passed, but the integration testing failed.

For EventTypeEnum, because its value is case sensitive, so we meet this issue.

8734d16638/openapi/openapi-client/src/main/java/org/apache/celeborn/rest/v1/model/SendWorkerEventRequest.java (L47-L83)

Related issue in jersey end I think, https://github.com/eclipse-ee4j/jersey/issues/5288

In this PR, `useEnumCaseInsensitive` is enabled for openapi-generator.

### Does this PR introduce _any_ user-facing change?
No, there is not user facing change and this SDK has not been released yet.

### How was this patch tested?
Existing UT and Integration testing.
<img width="1265" alt="image" src="https://github.com/user-attachments/assets/6a34a0dd-c474-4e8d-b372-19b0fda94972">

Closes #2754 from turboFei/eventTypeEnumMapping.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-23 20:43:04 +08:00
Aravind Patnam
dd25388955
[CELEBORN-1513][FOLLOWUP] Enrich doc for wildcard address
### What changes were proposed in this pull request?
enrich the docs for supporting wildcard address bind in this [PR](https://github.com/apache/celeborn/pull/2713).

### Why are the changes needed?
better docs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A - just docs change

Closes #2736 from akpatnam25/CELEBORN-1513-doc-followup.

Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-09-13 16:15:37 +08:00
SteNicholas
5875f6871f [CELEBORN-1506][FOLLOWUP] InFlightRequestTracker should not reset totalInflightReqs for cleaning up to avoid negative totalInflightReqs for limitZeroInFlight
### What changes were proposed in this pull request?

`InFlightRequestTracker` should not reset `totalInflightReqs` for cleaning up to avoid negative `totalInflightReqs` for `limitZeroInFlight`.

Follow up #2621.

### Why are the changes needed?

After #2621, there is a common case that attempt 0 succeeds in write and celeborn client successfully calls mapperEnd, but the attempt fails due to certain exception like `ExecutorLostFailure` . Meanwhile, other attempts are rerun, then clean up `pushState` because of mapper ended. The case causes the exception which is `Waiting timeout for task %s while limiting zero in-flight requests` for `limitZeroInFlight`. Therefore, `InFlightRequestTracker` could not reset `totalInflightReqs` for cleaning up to avoid negative `totalInflightReqs` for `limitZeroInFlight`. `InFlightRequestTracker` uses `cleaned` flag in `limitMaxInFlight` and `limitZeroInFlight`.

![image](https://github.com/user-attachments/assets/3b66d42e-5d6a-411f-8c3a-360349897ab7)

```
4/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] WARN InFlightRequestTracker: Clear InFlightRequestTracker
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.36.10:9092, creating a new one.
24/09/05 08:27:04 [data-client-5-1] WARN InFlightRequestTracker: BatchIdSet of 172.27.164.39:9092 is null.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.168.38:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.36.31:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.160.19:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.32.31:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.44.32:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.168.18:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.172.18:9092, creating a new one.
24/09/05 08:27:04 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] INFO TransportClientFactory: Found inactive connection to /172.27.164.19:9092, creating a new one.
24/09/05 08:27:15 [dispatcher-Executor] INFO Executor: Executor is trying to kill task 706.1 in stage 1.0 (TID 1203), reason: another attempt succeeded
24/09/05 08:27:15 [Executor task launch worker for task 706.1 in stage 1.0 (TID 1203)] INFO Executor: Executor interrupted and killed task 706.1 in stage 1.0 (TID 1203), reason: another attempt succeeded
24/09/05 08:47:08 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] ERROR InFlightRequestTracker: After waiting for 1200000 ms, there are still [] in flight, which exceeds the current limit 0.
24/09/05 08:47:08 [Executor task launch worker for task 715.2 in stage 1.0 (TID 1205)] ERROR Executor: Exception in task 715.2 in stage 1.0 (TID 1205)
org.apache.celeborn.common.exception.CelebornIOException: Waiting timeout for task 1-715-2 while limiting zero in-flight requests
	at org.apache.celeborn.client.ShuffleClientImpl.limitZeroInFlight(ShuffleClientImpl.java:676)
	at org.apache.celeborn.client.ShuffleClientImpl.mapEndInternal(ShuffleClientImpl.java:1555)
	at org.apache.celeborn.client.ShuffleClientImpl.mapperEnd(ShuffleClientImpl.java:1539)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.close(HashBasedShuffleWriter.java:367)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.write(HashBasedShuffleWriter.java:175)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:144)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:598)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1545)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:603)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
24/09/05 08:49:21 [dispatcher-Executor] INFO YarnCoarseGrainedExecutorBackend: Driver commanded a shutdown
24/09/05 08:49:21 [CoarseGrainedExecutorBackend-stop-executor] WARN ShuffleClientImpl: Shuffle client has been shutdown!
24/09/05 08:49:21 [CoarseGrainedExecutorBackend-stop-executor] INFO MemoryStore: MemoryStore cleared
24/09/05 08:49:21 [CoarseGrainedExecutorBackend-stop-executor] INFO BlockManager: BlockManager stopped
24/09/05 08:49:21 [pool-5-thread-1] INFO ShutdownHookManager: Shutdown hook called
ntImpl.mapperEnd(ShuffleClientImpl.java:1539)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.close(HashBasedShuffleWriter.java:367)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.write(HashBasedShuffleWriter.java:175)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:144)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:598)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1545)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:603)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2725 from SteNicholas/CELEBORN-1506.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-09-06 17:25:13 +08:00
Aravind Patnam
f06dba14b3 [CELEBORN-1513] Support wildcard bind in dual stack environments
### What changes were proposed in this pull request?
Support wildcard bind for RPC and HTTP servers. When wildcard address is used, the service is able to listen to both ipv4 and ipv6 traffic in dual-stack environments.

The specific scenario where this becomes relevant is as follows:

If some of the compute infrastructure is IPv4 only, some v6 only and others dual stack - the way we can have a single Celeborn infra to cater to all is by:
a) Set bind.preferip to false - so that advertised address is the host and not IP.

b) bind to wild card address

With both in place, the v4 only compute nodes will resolve the v4 address and connect to v4 ip/port.
Likewise, for v6 only.
Dual stack compute nodes will use prefer ipv6 Java flag to resolve to either v4 or v6.

This is how we are handling the combination of scenarios internally.

### Why are the changes needed?
see above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested on a server using netstat, and tried connecting to via `nc -4` and `nc -6` to ensure connection was there.

Closes #2713 from akpatnam25/CELEBORN-1513-fix.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-06 16:47:42 +08:00
SteNicholas
29c0f7e50f [CELEBORN-1501] Introduce application dimension resource consumption metrics of Worker
### What changes were proposed in this pull request?

Introduce application dimension resource consumption metrics of Worker for `ResourceConsumptionSource`.

### Why are the changes needed?

`ResourceConsumption` namespace metrics are generated for each user and they are identified using a metric tag at present. It's recommended to introduce application dimension resource consumption metrics that expose application dimension resource consumption of Worker. By monitoring resource consumption in the application dimension, you can obtain the actual situation of application resource consumption.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
curl http://celeborn-worker:9096/metrics|grep applicationId|grep disk|head -20
metrics_diskFileCount_Value{applicationId="application_1720756171504_197094_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 42 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1720756171504_197094_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 27157332949 1721132143020
metrics_diskFileCount_Value{applicationId="application_1718714878734_1549139_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 47 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1718714878734_1549139_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 483045590821721132143020
metrics_diskFileCount_Value{applicationId="application_1688369676084_19713351_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 20 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1688369676084_19713351_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 13112170199 1721132143020
metrics_diskFileCount_Value{applicationId="application_1718714878734_1552645_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 45 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1718714878734_1552645_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 353350343061721132143020
metrics_diskFileCount_Value{applicationId="application_1718714878734_1552665_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 59 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1718714878734_1552665_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 476373757311721132143020
metrics_diskFileCount_Value{applicationId="application_1720756171504_199529_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 59 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1720756171504_199529_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 54106810966 1721132143020
metrics_diskFileCount_Value{applicationId="application_1720756171504_199536_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 19 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1720756171504_199536_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 9215818606 1721132143020
metrics_diskFileCount_Value{applicationId="application_1650016801129_34416161_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 26 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1650016801129_34416161_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 23650636804 1721132143020
metrics_diskFileCount_Value{applicationId="application_1716712852097_2884119_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 12 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1716712852097_2884119_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 650314937 1721132143020
metrics_diskFileCount_Value{applicationId="application_1718714878734_1563526_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 16 1721132143020
metrics_diskBytesWritten_Value{applicationId="application_1718714878734_1563526_1",hostName="celeborn-worker",name="default",role="Worker",tenantId="default"} 1555862722 1721132143020
```
<img width="1351" alt="image" src="https://github.com/user-attachments/assets/3e007e80-7329-467b-bf74-cfe502b62ae5">
<img width="1351" alt="image" src="https://github.com/user-attachments/assets/d93ee335-c078-46b8-b682-3b1a04f8a614">
<img width="1351" alt="image" src="https://github.com/user-attachments/assets/62790378-38aa-480f-b959-6fdbad617808">
<img width="1352" alt="image" src="https://github.com/user-attachments/assets/b6717316-0b44-4a7b-a55b-4ffa844ded66">

Closes #2630 from SteNicholas/CELEBORN-1292.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-09-06 14:34:23 +08:00
Aravind Patnam
cc26131f88 [CELEBORN-1572] Celeborn CLI initial REST API support
### What changes were proposed in this pull request?
Introducing the Celeborn CLI (based on this [CPIP](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI)). For the first iteration, adding support for querying the existing REST api endpoints.
After this will add a layer for external cluster manager support. Further improvements are needed such as pretty print, which can be added in subsequent PRs.

### Why are the changes needed?
see [CPIP](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI)

### Does this PR introduce _any_ user-facing change?
yes, new CLI tool.

### How was this patch tested?
added UTs and also tested internally.

Closes #2699 from akpatnam25/cli-CELEBORN-1572.

Lead-authored-by: Aravind Patnam <apatnam@linkedin.com>
Co-authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2024-09-05 11:15:16 -05:00
Weijie Guo
40a654655b [CELEBORN-1490][CIP-6] Extends FileMeta to support hybrid shuffle
### What changes were proposed in this pull request?

Second PR to support Hybrid Shuffle.

This extends `FileMeta` as hybrid shuffle supports reading while writing at segment-grained.

We also introduce `isSegmentGranularityVisible` to some message and method to identify segment-based shuffle.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no need.

Closes #2716 from reswqa/cip6-2-pr.

Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-09-04 19:02:22 +08:00
Weijie Guo
5d61458964 [CELEBORN-1490][CIP-6] Extends message to support hybrid shuffle
### What changes were proposed in this pull request?

This is the first PR to support Hybrid Shuffle.

Extends message to support hybrid shuffle.

### Why are the changes needed?
hybrid shuffle is a tiered storage architecture, which introduces the concept of `segment`. One segment's data selects a tier to send. Data is split into segments and sent to multiple tiers.

This PR introduces segment-related message. In addition, hybrid shuffle needs to distinguish which subpartition it comes from when consuming data, so we need to extend the `SubpartitionId` field to `ReadData` (new class introduced for compatibility).

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no need.

Closes #2714 from reswqa/cip6-1-extend-message.

Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-30 09:43:09 +08:00
sychen
3ee672e15d
[CELEBORN-1565] Introduce warn-unused-import in Scala
### What changes were proposed in this pull request?
This PR aims to introduce `warn-unused-import` in Scala.

### Why are the changes needed?
There are currently many invalid imports, which can be checked using `-Ywarn-unused-import`.
And through `silencer`  plugin we can avoid some imports required in scala 2.11.

```scala
import org.apache.celeborn.common.util.FunctionConverter._
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2689 from cxzl25/CELEBORN-1565.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shaoyun Chen <csy@apache.org>
2024-08-29 13:43:17 +08:00
SteNicholas
f801b7a32d [CELEBORN-1583] MasterClient#sendMessageInner should throw Throwable for celeborn.masterClient.maxRetries is 0
### What changes were proposed in this pull request?

`MasterClient#sendMessageInner` should throw `Throwable` for `celeborn.masterClient.maxRetries` is 0.

### Why are the changes needed?

`MasterClient#sendMessageInner` causes `NullPointerException` with `Cannot throw exception because "throwable" is null` for `celeborn.masterClient.maxRetries` is 0.

```
2024-08-27T19:07:03.7681998Z 24/08/27 19:07:03,767 ERROR [celeborn-dispatcher-2] MasterClient: Send rpc with failure, has tried 0, max try 0!
2024-08-27T19:07:03.7693891Z 24/08/27 19:07:03,767 ERROR [celeborn-dispatcher-2] LifecycleManager: AskSync RegisterShuffle for app-1-1 failed.
2024-08-27T19:07:03.7695444Z java.lang.NullPointerException: Cannot throw exception because "throwable" is null
2024-08-27T19:07:03.7696857Z 	at org.apache.celeborn.common.client.MasterClient.sendMessageInner(MasterClient.java:167)
2024-08-27T19:07:03.7698346Z 	at org.apache.celeborn.common.client.MasterClient.askSync(MasterClient.java:121)
2024-08-27T19:07:03.7699927Z 	at org.apache.celeborn.client.LifecycleManager.requestMasterRequestSlots(LifecycleManager.scala:1621)
2024-08-27T19:07:03.7701836Z 	at org.apache.celeborn.client.LifecycleManager.requestMasterRequestSlotsWithRetry(LifecycleManager.scala:1610)
2024-08-27T19:07:03.7703976Z 	at org.apache.celeborn.client.LifecycleManager.org$apache$celeborn$client$LifecycleManager$$offerAndReserveSlots(LifecycleManager.scala:642)
2024-08-27T19:07:03.7706423Z 	at org.apache.celeborn.client.LifecycleManager$$anonfun$receiveAndReply$1.applyOrElse(LifecycleManager.scala:338)
2024-08-27T19:07:03.7708030Z 	at org.apache.celeborn.common.rpc.netty.Inbox.processInternal(Inbox.scala:119)
2024-08-27T19:07:03.7709352Z 	at org.apache.celeborn.common.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:218)
2024-08-27T19:07:03.7710619Z 	at org.apache.celeborn.common.rpc.netty.Inbox.safelyCall(Inbox.scala:314)
2024-08-27T19:07:03.7711825Z 	at org.apache.celeborn.common.rpc.netty.Inbox.process(Inbox.scala:218)
2024-08-27T19:07:03.7713139Z 	at org.apache.celeborn.common.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:238)
2024-08-27T19:07:03.7714639Z 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2024-08-27T19:07:03.7716148Z 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2024-08-27T19:07:03.7717292Z 	at java.base/java.lang.Thread.run(Thread.java:840)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`MasterClientSuiteJ#testSendMessageWithoutHAWithoutRetry`

Closes #2715 from SteNicholas/CELEBORN-1583.

Lead-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: Nicholas Jiang <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-29 11:58:26 +08:00
Bowen Liang
4844c82519 [CELEBORN-1560] Remove usages of deprecated Files.createTempDir of Guava
### What changes were proposed in this pull request?

### Why are the changes needed?

`com.google.common.io.Files#createTempDir` has been deprecated since long ago.
`java.nio.file.Files#createTempDirectory` should be used instead, as suggested in Guava's API Javadoc. (https://guava.dev/releases/33.1.0-jre/api/docs/com/google/common/io/Files.html)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2680 from bowenliang123/files-temp-dir.

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-26 14:43:27 +08:00
Bowen Liang
f226424b9a [CLEBORN-1555] Replace deprecated config celeborn.storage.activeTypes in docs and tests
### What changes were proposed in this pull request?

Replace the deprecated config `celeborn.storage.activeTypes` with `celeborn.storage.availableTypes` in docs and tests, guiding the new comers to use the new config names.

### Why are the changes needed?
The config `celeborn.storage.activeTypes` has been deprecated in 0.4.0 release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No feature changed.

Closes #2675 from bowenliang123/avai-types.

Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-26 14:36:01 +08:00
SteNicholas
9fefa66318 [CELEBORN-1550][FOLLOWUP] Support celeborn.dynamicConfig.store.backend with short name and backend implementation
### What changes were proposed in this pull request?

Introduce `ConfigStore` to support `celeborn.dynamicConfig.store.backend` with short name and backend implementation.

### Why are the changes needed?

`celeborn.dynamicConfig.store.backend` is allowed to be specified in two ways:

- Using short names: Default available options are FS, DB.
- Using the fully qualified class name of the backend implementation.

Therefore, it's recommended to introduce `ConfigStore` based on SPI  mechanism for `celeborn.dynamicConfig.store.backend` instead of `dynamicConfigStoreBackendShortNames` which could not add other short name of backend implementation for users.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #2698 from SteNicholas/CELEBORN-1550.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-23 13:53:21 +08:00
Wang, Fei
ae41cb5ade [CELEBORN-1537] Support to remove workers unavailable info with RESTful api
### What changes were proposed in this pull request?
In [CELEBORN-1535](https://issues.apache.org/jira/browse/CELEBORN-1535), we support to disable master workerUnavilableInfo expiration.

 In this PR,  a new RestAPI  introduced for manually remove unavailable workers. Then it can be used on demand.

### Why are the changes needed?
To cleanup the works unavailable info on demand manually if we disable the expiration.

### Does this PR introduce _any_ user-facing change?
Yes, a new RESTful API.

### How was this patch tested?
UT.

Closes #2658 from turboFei/support_cleanup.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-19 11:10:33 +08:00
Aravind Patnam
2ad44baeef [CELEBORN-1563] Log networkLocation in WorkerInfo
### What changes were proposed in this pull request?
Log networkLocation so that it appears during worker registration

### Why are the changes needed?
in the case of custom network location, it would be good to log it

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
updated unit tests

Closes #2685 from akpatnam25/CELEBORN-1563.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-08-16 16:26:45 +08:00
Aravind Patnam
74423fbd6d [CELEBORN-1549] Fix networkLocation persistence into Ratis
### What changes were proposed in this pull request?
Fixing a bug where the `networkLocation` is not persisted in Ratis, and the master defaults to `DEFAULT_RACK` when it loads the snapshot. This was missed in https://github.com/apache/celeborn/pull/2367 unfortunately, and it came up during our stress testing internally.

### Why are the changes needed?
Needed for custom network aware replication, so that networkLocation state is kept in snapshot file.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Updated unit test to ensure serde is correct.

Closes #2669 from akpatnam25/CELEBORN-1549.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-08-14 13:55:19 +08:00
Sanskar Modi
a0b04d0036 [CELEBORN-1550] Add support of providing custom dynamic store backend implementation
### What changes were proposed in this pull request?

Adding support of providing custom dynamic store backend implementation, users can now pass there own implementation for dynamic config store backend.

This change also keep the backwards compatibility of supporting short names for backend like "FS" and "DB"

### Why are the changes needed?

Currently celeborn only supports File and DB based backend while there can be other ways of managing these configs.

### Does this PR introduce _any_ user-facing change?

NO, user facing behaviour will be same.

### How was this patch tested?

Existing UTs verifies that this change is working for "FS" and "DB" implementation.

Closes #2670 from s0nskar/dynamic_config.

Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-08-12 15:04:43 +08:00
zhaohehuhu
59b88beb62 [CELEBORN-1529] Read shuffle data from S3
### What changes were proposed in this pull request?
as title

### Why are the changes needed?

The change aims to make Celeborn read shuffle data from S3

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
Yes

Closes #2651 from zhaohehuhu/dev-0726.

Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-08-12 14:53:14 +08:00
Mridul Muralidharan
3234bef81b [CELEBORN-1518] Add support for Apache Spark barrier stages
### What changes were proposed in this pull request?

Adds support for barrier stages.
This involves two aspects:
a) If there is a task failure when executing a barrier stage, all shuffle output for the stage attempt are discarded and ignored.
b) If there is a reexecution of a barrier stage (for ex, due to child stage getting a fetch failure), all shuffle output for the previous stage attempt are discarded and ignored.

This is similar to handling of indeterminate stages when `throwsFetchFailure` is `true`.

Note that this is supported only when `spark.celeborn.client.spark.fetch.throwsFetchFailure` is `true`

### Why are the changes needed?

As detailed in CELEBORN-1518, Celeborn currently does not support barrier stages; which is an essential functionality in Apache Spark which is widely in use by Spark users.
Enhancing Celeborn will allow its use for a wider set of Spark users.

### Does this PR introduce _any_ user-facing change?

Adds ability for Celeborn to support Apache Spark Barrier stages.

### How was this patch tested?

Existing tests, and additional tests (thanks to jiang13021 in #2609 - [see here](https://github.com/apache/celeborn/pull/2609/files#diff-e17f15fcca26ddfc412f0af159c784d72417b0f22598e1b1ebfcacd6d4c3ad35))

Closes #2639 from mridulm/fix-barrier-stage-reexecution.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-08-12 14:20:11 +08:00
Wang, Fei
91058acfda [CELEBORN-1542] Master supports to check the worker host pattern on worker registration
### What changes were proposed in this pull request?
This pr introduce an optional config item for worker host pattern, and support to check whether the worker host matches the pattern in master end when registering the worker.

If it does not match, the register worker request will be rejected.

### Why are the changes needed?
Currently, the celeborn master allow all the workers to register. It is better to limit the workers allowed to register.

### Does this PR introduce _any_ user-facing change?

No, the config item is optional, no broken change.

### How was this patch tested?
UT.

Closes #2660 from turboFei/hosts_patterns.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-08 22:43:19 +08:00
Wang, Fei
6d2e1c8d46
[CELEBORN-1521][FOLLOWUP] Move BasicPrincipal to spi module
### What changes were proposed in this pull request?

This PS is a followup of CELEBORN-1521, move the BasicPrincipal in to celeborn-spi module. so that customer do not need to implement it by themselves.
### Why are the changes needed?

For authentication extension.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

Existing GA.

Closes #2665 from turboFei/spi_follow.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-08-07 14:42:29 +08:00
Wang, Fei
a599ff2afe [CELEBORN-1535] Support to disable master workerUnavailableInfo expiration
### What changes were proposed in this pull request?

In this pr, it supports to disable the worker unavailable expiration by setting the timeout to -1.

### Why are the changes needed?

In our use case, we want to reserve all the worker unavailable information.
It is acceptable if we use the fixed ports and hosts, and will not occupy much memory resource.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

Not needed.

Closes #2657 from turboFei/disable_Cleanup.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-07 08:22:36 +08:00
Sanskar Modi
ca15eea8be [CELEBORN-1511] Add support for custom master endpoint resolver
### What changes were proposed in this pull request?

Proposing to add a master endpoint resolver, which makes the master endpoints discovery extensible and users can leverage this to create and pass different types of resolver which best fit to their need.

### Why are the changes needed?

Currently celeborn support passing master endpoints by these celeborn configs `celeborn.master.endpoints` and `celeborn.master.internal.endpoints` and the allowed pattern for these configs `<host1>:<port1>[,<host2>:<port2>]*`. Workers and Clients both use above configs to connect with master.

The problem with this approach is that currently it takes static host or IP or domain address which can change over time for a long running worker or client. Ex – Master node going down, domain UUID changed. In our infra this discovery is done by a passing a service group which actively watch the nodes for master service and but there is no way to make it work with celeborn as currently celeborn only works with static addresses.

### Does this PR introduce _any_ user-facing change?

Default behaviour will remain same but user can now pass their own master endpoint resolver.

### How was this patch tested?

Added new UTs

Closes #2629 from s0nskar/masterresolver.

Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-07 08:12:41 +08:00
Wang, Fei
dceef479bb [CELEBORN-1541] Enhance the readable address for internal port
### What changes were proposed in this pull request?
If the internal port is not defined, do not show the internal port(-1) in the readable address.

### Why are the changes needed?

If the workerInfo is applied for the RESTful request, such as curl `/api/v1/workers/exclude`, the internal port is always `-1`.

It is not necessary to show the `-1` in the readable address.

### Does this PR introduce _any_ user-facing change?
Just reduce the unnecessary info.

### How was this patch tested?

Not needed.

Closes #2659 from turboFei/internal_port.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-08-02 15:06:19 +08:00
SteNicholas
893085c9c0 [CELEBORN-1473] TransportClientFactory should register netty memory metric with source for shared pooled ByteBuf allocator
### What changes were proposed in this pull request?

`TransportClientFactory` registers netty memory metric with source of `TransportContext` for shared pooled ByteBuf allocator.

Address https://github.com/apache/celeborn/pull/2585#issuecomment-2183720137.

### Why are the changes needed?

The default value of `celeborn.network.memory.allocator.share` is true, which means that enables shared memory allocator at default. Meanwhile, `RpcEnv#create` does not create `TransportClientFactory` with source, which cause lack of `NettyMemoryMetric` for `TransportClientFactory`.

Therefore, `TransportClientFactory` should also register `NettyMemoryMetric` with `WorkerSource` of `TransportContext` for netty memory metrics of shared pooled memory.

![Untitled-2024-06-23-0500](https://github.com/user-attachments/assets/e527be5f-d155-4129-ad49-31a351616b58)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- `celeborn.network.memory.allocator.allowCache=false`
```
$ curl http://celeborn-worker:9096/metrics|grep metrics_shared_pool_used
metrics_shared_pool_1_usedHeapMemory_Value{hostName="celeborn-worker",role="Worker"} 29360128 1722505389446
metrics_shared_pool_1_usedDirectMemory_Value{hostName="celeborn-worker",role="Worker"} 188743680 1722505389446
```

- `celeborn.network.memory.allocator.allowCache=true`
```
$ curl http://celeborn-worker:9096/metrics|grep metrics_shared_pool_used
metrics_shared_pool_1_usedHeapMemory_Value{hostName="celeborn-worker",role="Worker"} 8388608 1722505646795
metrics_shared_pool_1_usedDirectMemory_Value{hostName="celeborn-worker",role="Worker"} 8388608 1722505646795
metrics_shared_pool_0_usedHeapMemory_Value{hostName="celeborn-worker",role="Worker"} 281018368 1722505646795
metrics_shared_pool_0_usedDirectMemory_Value{hostName="celeborn-worker",role="Worker"} 1438646272 1722505646796
```

Closes #2652 from SteNicholas/CELEBORN-1473.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-08-01 23:11:44 +08:00
Wang, Fei
ea6617c0d5 [CELEBORN-1521] Introduce celeborn-spi module for authentication extensions
### What changes were proposed in this pull request?
Introduce celeborn-spi module for authentication extensions.

### Why are the changes needed?
Address comments: https://github.com/apache/celeborn/pull/2632#issuecomment-2247132115

### Does this PR introduce _any_ user-facing change?
No, this interface has not been released.

### How was this patch tested?

UT.

Closes #2644 from turboFei/celeborn_spi.

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
2024-07-25 00:52:00 -07:00
SteNicholas
c0ca9523f9 [CELEBORN-1190][FOLLOWUP] Fix WARNING of error prone
### What changes were proposed in this pull request?

- Fix `WARNING` of error prone.
- Disable `EmptyCatch`, `JdkObsolete`, `MutableConstantField` and `UnnecessaryParentheses`.

### Why are the changes needed?

There are many `WARNING` generated by error prone. We should follow the suggestion of error prone to fix `WARNING`.

```
$ mvn clean install -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/SaslUtils.java:[44,25] [MutableConstantField] Constant field declarations should use the immutable type (such as ImmutableList) instead of the general collection interface type (such as List)
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/SaslUtils.java:[47,18] [MutableConstantField] Constant field declarations should use the immutable type (such as ImmutableList) instead of the general collection interface type (such as List)
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/client/TransportClientBootstrap.java:[34,5] [InvalidParam] Parameter name `channel` is unknown.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java:[96,29] [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java:[104,30] [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/anonymous/AnonymousSaslServerFactory.java:[67,2] [ClassCanBeStatic] Inner class is non-static but does not reference enclosing class
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/meta/FileInfo.java:[60,17] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/util/TransportFrameDecoder.java:[54,46] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManager.java:[207,29] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManager.java:[216,28] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/anonymous/AnonymousSaslClientFactory.java:[73,2] [ClassCanBeStatic] Inner class is non-static but does not reference enclosing class
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/network/sasl/anonymous/AnonymousSaslClientFactory.java:[93,31] [DefaultCharset] Implicit use of the platform default charset, which can result in differing behaviour between JVM executions or incorrect behavior if the encoding of the data source doesn't match expectations.
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/util/ExceptionUtils.java:[65,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/common/src/main/java/org/apache/celeborn/common/util/ExceptionUtils.java:[66,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/SslSampleConfigs.java:[164,16] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/SslSampleConfigs.java:[165,14] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/SslSampleConfigs.java:[165,35] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/SSLTransportClientFactorySuiteJ.java:[32,14] [MissingOverride] setUp overrides method in TransportClientFactorySuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/SSLTransportClientFactorySuiteJ.java:[40,14] [MissingOverride] tearDown overrides method in TransportClientFactorySuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/protocol/EncryptedMessageWithHeaderSuiteJ.java:[124,6] [UseCorrectAssertInTests] Java assert is used in test. For testing purposes Assert.* matchers should be used.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/RpcIntegrationSuiteJ.java:[255,15] [UnusedMethod] Private method 'assertErrorAndClosed' is never used.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/RpcIntegrationSuiteJ.java:[154,17] [UnusedNestedClass] This nested class is unused, and can be removed.
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/RpcIntegrationSuiteJ.java:[57,15] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManagerSuiteJ.java:[107,10] [AssertThrowsMultipleStatements] The lambda passed to assertThrows should contain exactly one statement
[WARNING] /Users/nicholas/Github/celeborn/common/src/test/java/org/apache/celeborn/common/network/ssl/ReloadingX509TrustManagerSuiteJ.java:[134,10] [AssertThrowsMultipleStatements] The lambda passed to assertThrows should contain exactly one statement
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/read/LocalPartitionReader.java:[84,31] [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[130,6] [ThreadLocalUsage] ThreadLocals should be stored in static fields
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[714,6] [MissingCasesInEnumSwitch] Non-exhaustive switch; either add a default or handle the remaining cases: SUCCESS, PARTIAL_SUCCESS, REQUEST_FAILED, and 43 others
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1609,10] [MissingCasesInEnumSwitch] Non-exhaustive switch; either add a default or handle the remaining cases: PARTIAL_SUCCESS, REQUEST_FAILED, SHUFFLE_ALREADY_REGISTERED, and 45 others
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1648,26] [MissingOverride] updateFileGroup implements method in ShuffleClient; expected Override
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1654,57] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:[1823,32] [MissingOverride] getDataClientFactory implements method in ShuffleClient; expected Override
[WARNING] /Users/nicholas/Github/celeborn/client/src/test/java/org/apache/celeborn/client/ShuffleClientSuiteJ.java:[185,6] [UseCorrectAssertInTests] Java assert is used in test. For testing purposes Assert.* matchers should be used.
[WARNING] /Users/nicholas/Github/celeborn/service/src/main/java/org/apache/celeborn/server/common/service/store/db/DbServiceManagerImpl.java:[70,33] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/service/src/main/java/org/apache/celeborn/server/common/service/store/db/DbServiceManagerImpl.java:[71,33] [JavaUtilDate] Date has a bad API that leads to bugs; prefer java.time.Instant or LocalDate.
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HARaftServer.java:[424,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HARaftServer.java:[425,11] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HARaftServer.java:[496,55] [UnescapedEntity] This looks like a type with type parameters. The < and > characters here will be interpreted as HTML, which can be avoided by wrapping it in a {code } tag.
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/SingleMasterMetaManager.java:[166,14] [MissingOverride] handleUpdatePartitionSize implements method in IMetadataHandler; expected Override
[WARNING] /Users/nicholas/Github/celeborn/master/src/main/java/org/apache/celeborn/service/deploy/master/SlotsAllocator.java:[298,61] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/MapPartitionDataReader.java:[346,37] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[202,33] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[300,31] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[497,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[503,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:[513,39] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/CreditStreamManager.java:[256,12] [ClassCanBeStatic] Inner class is non-static but does not reference enclosing class
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/WorkerSecretRegistryImpl.java:[73,12] [CacheLoaderNull] The result of CacheLoader#load must be non-null.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[69,13] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[73,13] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[103,24] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ReducePartitionDataWriter.java:[104,39] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/MapPartitionDataWriter.java:[261,46] [ByteBufferBackingArray] ByteBuffer.array() shouldn't be called unless ByteBuffer.arrayOffset() is used or if the ByteBuffer was initialized using ByteBuffer.wrap() or ByteBuffer.allocate().
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ChunkStreamManager.java:[102,40] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/ChunkStreamManager.java:[109,40] [NonAtomicVolatileUpdate] This update of a volatile variable is non-atomic
[WARNING] /Users/nicholas/Github/celeborn/worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java:[318,39] [IntLongMath] Expression of type int may overflow before being assigned to a long
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/FetchHandlerSuiteJ.java:[133,6] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/network/SSLRequestTimeoutIntegrationSuiteJ.java:[32,14] [MissingOverride] setUp overrides method in RequestTimeoutIntegrationSuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/network/SSLRequestTimeoutIntegrationSuiteJ.java:[40,14] [MissingOverride] tearDown overrides method in RequestTimeoutIntegrationSuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/ChunkFetchIntegrationSuiteJ.java:[74,15] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/ChunkFetchIntegrationSuiteJ.java:[186,47] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/SSLReducePartitionDataWriterSuiteJ.java:[30,26] [MissingOverride] createModuleTransportConf overrides method in DiskReducePartitionDataWriterSuiteJ; expected Override
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/local/DiskReducePartitionDataWriterSuiteJ.java:[234,47] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/worker/src/test/java/org/apache/celeborn/service/deploy/worker/storage/memory/MemoryReducePartitionDataWriterSuiteJ.java:[198,47] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
```
```
$ mvn clean install -Pspark-2.4 -pl client-spark/common,client-spark/spark-2 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SortBasedPusher.java:[109,57] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SendBufferPool.java:[56,14] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SendBufferPool.java:[57,21] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java:[247,14] [UnusedMethod] Private method 'executorCores' is never used.
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java:[120,55] [ReferenceEquality] Comparison using reference equality instead of value equality
```
```
$ mvn clean install -Pspark-3.5 -pl client-spark/spark-3 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/CelebornShuffleDataIO.java:[65,17] [MissingOverride] supportsReliableStorage implements method in ShuffleDriverComponents; expected Override
[WARNING] /Users/nicholas/Github/celeborn/client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkShuffleManager.java:[163,55] [ReferenceEquality] Comparison using reference equality instead of value equality
```
```
$ mvn clean install -Pflink-1.14 -pl client-flink/common,client-flink/flink-1.14 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/readclient/CelebornBufferStream.java:[161,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/readclient/CelebornBufferStream.java:[223,27] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/ShuffleTaskInfo.java:[46,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[99,66] [JdkObsolete] It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[236,21] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[251,19] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[267,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[354,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[392,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[473,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/RemoteShuffleInputGateDelegation.java:[533,17] [SynchronizeOnNonFinalField] Synchronizing on non-final fields is not safe: if the field is ever updated, different threads may end up locking on different objects.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/buffer/TransferBufferPool.java:[182,33] [MixedMutabilityReturnType] This method returns both mutable and immutable collections or maps from different paths. This may be confusing for users of the method.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/main/java/org/apache/celeborn/plugin/flink/utils/FlinkUtils.java:[34,6] [DoubleBraceInitialization] Prefer collection factory methods or builders to the double-brace initialization pattern.
[WARNING] /Users/nicholas/Github/celeborn/client-flink/common/src/test/java/org/apache/celeborn/plugin/flink/BufferPackSuiteJ.java:[207,6] [CatchAndPrintStackTrace] Logging or rethrowing exceptions should usually be preferred to catching and calling printStackTrace
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.14/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.15 -pl client-flink/flink-1.15 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.15/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.17 -pl client-flink/flink-1.16 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
```
```
$ mvn clean install -Pflink-1.17 -pl client-flink/flink-1.17 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.17/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.18 -pl client-flink/flink-1.18 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.18/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pflink-1.19 -pl client-flink/flink-1.19 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
[WARNING] /Users/nicholas/Github/celeborn/client-flink/flink-1.19/src/test/java/org/apache/celeborn/plugin/flink/RemoteShuffleResultPartitionSuiteJ.java:[140,67] [CanonicalDuration] Duration can be expressed more clearly with different units
```
```
$ mvn clean install -Pmr -pl client-mr/mr -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

```
$ mvn clean install -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pspark-2.4 -pl client-spark/common,client-spark/spark-2 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pspark-3.5 -pl client-spark/spark-3 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.14 -pl client-flink/common,client-flink/flink-1.14 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.15 -pl client-flink/flink-1.15 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.16 -pl client-flink/flink-1.15 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.17 -pl client-flink/flink-1.17 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.18 -pl client-flink/flink-1.18 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pflink-1.19 -pl client-flink/flink-1.19 -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
$ mvn clean install -Pmr -pl client-mr/mr -DskipTests -Dcheckstyle.skip=true -Drat.skip=true -Dspotless.check.skip=true|grep WARNING|grep java
```

Closes #2555 from SteNicholas/CELEBORN-1190.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2024-07-25 01:21:15 -05:00
Sanskar Modi
0a68ae049c
[CELEBORN-1520] Minor logging fix for AppDiskUsageMetric and Fixed UTs
### What changes were proposed in this pull request?

- Minor logging change in AppDiskUsageMetric which are not critical but slightly bothering.
- Fixed UT's for AppDiskUsageMetric

### Why are the changes needed?

1. Current AppDiskUsageMetric UTs were like placeholder and just printing the output. They were not testing/verifying anything.
2. Minor logging change with AppDiskUsageMetric.
- Comma formating was wrong

```
Snapshot start 2024-07-24T08:47:12.496 end 2024-07-24T08:57:12.497 Application application_1717149813731_19042841_2 used approximate 15.9 GiB ,Application application_1717149813731_19042841_1 used approximate 13.9 GiB
```

- We were printing an extra empty line after each summary.

```
211:20:24.339 [master-app-disk-usage-metrics-logger] INFO  org.apache.celeborn.common.meta.AppDiskUsageMetric - App Disk Usage Top50 Report
Snapshot start 2024-07-24T09:17:12.498 end 2024-07-24T09:27:12.498 Application application_XXX used approximate 14.5 GiB
Snapshot start 2024-07-24T08:17:12.495 end 2024-07-24T08:27:12.496 Application application_XXX used approximate 15.9 GiB

11:27:12.507 [master-app-disk-usage-metrics-logger] INFO  org.apache.celeborn.common.meta.AppDiskUsageMetric - App Disk Usage Top50 Report
Snapshot start 2024-07-24T09:17:12.498 end 2024-07-24T09:27:12.498 Application application_XXX used approximate 14.5 GiB
Snapshot start 2024-07-24T08:17:12.495 end 2024-07-24T08:27:12.496 Application application_XXX used approximate 15.9 GiB
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Fixed current UTs and verified from the logs.

Closes #2643 from s0nskar/app_disk_usage.

Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-25 13:47:12 +08:00
zhaohehuhu
7a596bbed1 [CELEBORN-1469] Support writing shuffle data to OSS(S3 only)
### What changes were proposed in this pull request?

as title

### Why are the changes needed?

Now, Celeborn doesn't support sinking shuffle data directly to Amazon S3, which could be a limitation when we're trying to move on-premises servers to AWS and use S3 as a data sink for shuffled data.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Closes #2579 from zhaohehuhu/dev-0619.

Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-07-24 11:59:15 +08:00
zky.zhoukeyong
d692e49089 [CELEBORN-1446] Enable chunk prefetch when initialize CelebornInputStream
### What changes were proposed in this pull request?
https://github.com/apache/celeborn/pull/2348 avoids fetching first chunk in the constructor
of `CelebornInputStreamImpl`, but in some cases, i.e. coalescing 3000 partitions into one in Spark,
it can be beneficial to do so for performance. This PR adds back prefetching with knobs default to false.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
Yes, two configs are added.

### How was this patch tested?
Extended `MemorySkewJoinSuite` and `ReusedExchangeSuite`, and manual test.

Closes #2549 from waitinfuture/1446.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-18 19:46:47 +08:00
wangshengjie3
32509d5962 [CELEBORN-1439] Fix revive logic bug which will casue data correctness issue and job failiure
### What changes were proposed in this pull request?
Fix revive logic bug to avoid data correctness issue

### Why are the changes needed?
Current logic will revive same partition-epoch multi times

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
UT

Closes #2532 from wangshengjie123/fix-revive.

Lead-authored-by: wangshengjie3 <soldier.sj.wang@gmail.com>
Co-authored-by: wangshengjie <wangshengjie3@xiaomi.com>
Co-authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-07-18 09:35:53 +08:00
mingji
04c4b857a1 [CELEBORN-914][FOLLOWUP] Add aggressive mode to evict memory shuffle file
### What changes were proposed in this pull request?
Add an aggressive mode to evict memory shuffle files.

### Why are the changes needed?
To evict more shuffle files to reduce memory pressure.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA.

Closes #2602 from FMX/b914-4.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-15 10:27:30 +08:00
zky.zhoukeyong
8d0b4cf4cd [CELEBORN-1506][BUG] Revert "[CELEBORN-1036][FOLLOWUP] totalInflightReqs should decrement when batchIdSet contains the batchId to avoid duplicate caller of removeBatch"
### What changes were proposed in this pull request?
One of our users reported a dataloss issue in https://github.com/apache/celeborn/pull/2612 , I tried to reproduce
the bug with the following setup:
1. Partition data is far larger than `spark.celeborn.client.shuffle.partitionSplit.threshold`, which means split happens very often
2. `spark.celeborn.client.shuffle.partitionSplit.threshold` is larger than `celeborn.worker.shuffle.partitionSplit.max`, which means when split happens, it is `HARD_SPLIT`
3. `celeborn.client.shuffle.batchHandleChangePartition.enabled` is true, which means when hard split happens, `LifecycleManager` will commit the splits before the stage finishes

Configs in spark side:
```
spark.celeborn.client.push.maxReqsInFlight.perWorker | 256
spark.celeborn.client.push.maxReqsInFlight.total | 2048
spark.celeborn.client.shuffle.batchHandleCommitPartition.enabled | true
spark.celeborn.client.shuffle.compression.codec | zstd
spark.celeborn.client.shuffle.partitionSplit.threshold | 48m
spark.celeborn.client.spark.fetch.throwsFetchFailure | true
spark.celeborn.client.spark.push.sort.memory.adaptiveThreshold | true
spark.celeborn.client.spark.push.sort.memory.threshold | 512m
spark.celeborn.client.spark.shuffle.writer | sort
spark.celeborn.master.endpoints | master-1-1:9097

```
Configs in celeborn side:
```
celeborn.metrics.enabled=false
celeborn.replicate.io.numConnectionsPerPeer=24
celeborn.application.heartbeat.timeout=120s
celeborn.worker.storage.dirs=/mnt/disk1,/mnt/disk2
celeborn.network.timeout=2000s
celeborn.ha.enabled=false
celeborn.worker.closeIdleConnections=true
celeborn.worker.monitor.disk.enabled=false
celeborn.worker.flusher.threads=16

celeborn.worker.graceful.shutdown.enabled=true
celeborn.worker.rpc.port=9100
celeborn.worker.push.port=9101
celeborn.worker.fetch.port=9102
celeborn.worker.replicate.port=9103

celeborn.worker.shuffle.partitionSplit.max=10m  // this is made to be small
```

My query on 10T TPCDS:
```
select
max(ss_sold_time_sk      ),
max(ss_item_sk           ),
max(ss_customer_sk       ),
max(ss_cdemo_sk          ),
max(ss_hdemo_sk          ),
max(ss_addr_sk           ),
max(ss_store_sk          ),
max(ss_promo_sk          ),
max(ss_ticket_number     ),
max(ss_quantity          ),
max(ss_wholesale_cost    ),
max(ss_list_price        ),
max(ss_sales_price       ),
max(ss_ext_discount_amt  ),
max(ss_ext_sales_price   ),
max(ss_ext_wholesale_cost),
max(ss_ext_list_price    ),
max(ss_ext_tax           ),
max(ss_coupon_amt        ),
max(ss_net_paid          ),
max(ss_net_paid_inc_tax  ),
max(ss_net_profit        ),
max(ss_sold_date_sk      )
from (
select * from store_sales where ss_sold_date_sk is not null distribute by ss_sold_date_sk
) a;
```

After digging into it, I found the bug is introduced by https://github.com/apache/celeborn/pull/2134 . #2134 added
check in `InFlightRequestTracker#addBatch` and `InFlightRequestTracker#removeBatch` and only
increments/decrements `totalInflightReqs`  when `batchIdSet` contains current `batchId`, which conflicts with
`ShuffleClientImpl#PushDataRpcResponseCallback#updateLatestPartition`, which calls `addBatch` first then calls
`removeBatch` with the same batchId. As a result, the call to `addBatch` fails to increment `totalInflightReqs`, but
the call to `removeBatch` decrements `totalInflightReqs`, which means the retried push is not counted, then later
`limitZeroInFlight` in `mapperEnd` will return even though the retried push fails.

This PR fixes the bug by reverting #2134

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Closes #2621 from waitinfuture/1506.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-07-13 13:09:46 +08:00
Fei Wang
09d3a3b05f [CELEBORN-1493] Check admin privileges for http mutative requests
### What changes were proposed in this pull request?

If authentication enabled, check admin privileges for http mutative requests.

Likes:

```
POST /api/v1/workers/exclude
POST /api/v1/workers/events
POST /api/v1/workers/exit
```

### Why are the changes needed?

For security requirement.

### Does this PR introduce _any_ user-facing change?
Yes, after this pr, if http authentication enabled, for all mutative http requests, it will check the admin privileges.

Before this PR, if an API is not defined and the method is `POST/PUT/DELETE/PATCH`, the response status code is `404`.
After this PR, if the admin privileges check failed, the response status code will be `403`.

### How was this patch tested?
UT.

Closes #2601 from turboFei/admin_check.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-12 11:51:35 +08:00
Fei Wang
d698a69edc
[CELEBORN-1477][CIP-9] Refine the celeborn RESTful APIs
### What changes were proposed in this pull request?

This PR is for [CIP-9 Refine the celeborn RESTful APIs](https://docs.google.com/document/d/1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs/edit?usp=sharing).

We leverage [openapi-generator](https://github.com/OpenAPITools/openapi-generator) to generate the client and model code.

### Why are the changes needed?

Celeborn has implemented RESTful APIs for monitoring and administrative operations on both master and worker endpoints. These APIs enable tasks such as configuration checks, status viewing of master/worker nodes, worker decommissioning/recommissioning, and more. They provide crucial insights and support for DevOps.
The primary concern with the existing API is the response content type, which is `text/plain` rather than the more widely accepted `application/json`. This mismatch makes integration with DevOps tools challenging, as these tools typically require JSON-formatted responses for seamless parsing and automation.
And I also saw the need for REST API evolution in[ Apache Celeborn CLI Proposal](https://cwiki.apache.org/confluence/display/CELEBORN/CIP-7+Celeborn+CLI).

### Does this PR introduce _any_ user-facing change?
This pr introduce  a new API namespace: `/api/v1`. This approach allows us to maintain the current API for compatibility while offering an improved version.

### How was this patch tested?
UT.

Closes #2599 from turboFei/cip_9_openapi.

Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-11 10:57:00 +08:00
Mridul Muralidharan
e76e445242
[CELEBORN-1494] Support IPv6 addresses in PbSerDeUtils.fromPackedPartitionLocations
### What changes were proposed in this pull request?

When Celeborn runs under IPv6, client side deserialization fails when workers locations are IPv6 addresses.
This is because `PbSerDeUtils.fromPackedPartitionLocations` does a split by `":"` - which does not work when host is an IPv6 address

### Why are the changes needed?

Fix IPv6 support

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test added - test fails without changes to `PbSerDeUtils.fromPackedPartitionLocations`

Closes #2606 from mridulm/fix-ip-v6.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-07-09 17:40:12 +08:00
mingji
e68e1729d9 [CELEBORN-1483] Add storage policy
### What changes were proposed in this pull request?
To refactor partition data writer.
Part of CIP-8.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
GA

Closes #2595 from FMX/b1483.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-09 17:16:09 +08:00
Fei Wang
f6916317ec
[CELEBORN-1318][FOLLOWUP] Transfer extraInfo for http authentication providers
### What changes were proposed in this pull request?
I am implementing the plugin for Bearer token authentication, and I found that, in ebay, the tokens are bound to client IP.

So, I also need to transfer the clientIp for token validation, I wonder that it is a general case.

This pr is a followup for Http password/token authentication and extend the current interface api.

### Why are the changes needed?
To extend the token authentication use case in case that we need more properties associate with the token.

### Does this PR introduce _any_ user-facing change?

No, this interface `TokenAuthenticationProvider` has not been released.

### How was this patch tested?

Not needed.

Closes #2604 from turboFei/auth_properties.

Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-07-06 00:50:37 +08:00
Mridul Muralidharan
c90a1647af [CELEBORN-1489] Update Flink support with authentication support
### What changes were proposed in this pull request?
Fix authentication support for Apache Flink.

### Why are the changes needed?
Without these changes, Apache Flink applications fail when Celeborn cluster has authentication enabled.

### Does this PR introduce _any_ user-facing change?

Fixes authentication support for Apache Flink integration

### How was this patch tested?

This is forward port + adaptation of changes we did internally (against 0.4) when testing Apache Flink applications against Celeborn cluster with authentication (and TLS) enabled.
Integration test has been updated to additionally test for Flink applications with authentication enabled in Celeborn cluster.

Closes #2596 from mridulm/fix-flink-auth-support.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-07-04 10:17:56 +08:00
SteNicholas
c311a35516
[CELEBORN-1485] Refactor addCounter, addGauge and addTimer of AbstractSource to reduce CPU utilization
### What changes were proposed in this pull request?

Refactor `addCounter`, `addGauge` and `addTimer` of `AbstractSource` to reduce CPU utilization.

### Why are the changes needed?

`addCounter`, `addGauge` and `addTimer` of `AbstractSource` checks whether the metric key exist via `MetricRegistry#getMetrics` which iterates all metrics and put into map at present. It causes that adding counter of active connection count metric for application dimension would increase high CPU utilization when there are many active connections:

<img width="1350" alt="image" src="https://github.com/apache/celeborn/assets/10048174/cc882fac-eec1-417b-ba17-f3012053c6c7">

The implementation of `MetricRegistry#getMetrics` is as follows:

```
private <T extends Metric> SortedMap<String, T> getMetrics(Class<T> klass, MetricFilter filter) {
    final TreeMap<String, T> timers = new TreeMap<>();
    for (Map.Entry<String, Metric> entry : metrics.entrySet()) {
        if (klass.isInstance(entry.getValue()) && filter.matches(entry.getKey(), entry.getValue())) {
            timers.put(entry.getKey(), (T) entry.getValue());
        }
    }
    return Collections.unmodifiableSortedMap(timers);
}
```

Refactor `addCounter`, `addGauge` and `addTimer` of `AbstractSource` to reduce CPU utilization.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Cluster test.

<img width="1345" alt="image" src="https://github.com/apache/celeborn/assets/10048174/4c0a7f92-3cc5-45f8-941f-e1d0166043e1">

Closes #2593 from SteNicholas/CELEBORN-1485.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-07-01 20:16:28 +08:00
mingji
a9d1d64cf4 [CELEBORN-914][FOLLOWUP] optimize write and sort logic for memory storage
### What changes were proposed in this pull request?
1. Optimize writer check offsets logic
2. Optimize sort memory file logic

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA.

Closes #2581 from FMX/b914-1.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-06-21 16:17:59 +08:00
Fei Wang
5cea9cc7f2
[CELEBORN-1318] Support celeborn http authentication
### What changes were proposed in this pull request?
Support celeborn master/worker http authentication.

### Why are the changes needed?
Authentication is needed for celeborn admin APIs.

### Does this PR introduce _any_ user-facing change?
Yes, introduce authentication related config items, but does not break the current behavior.

### How was this patch tested?

Added UT for BASIC and Bearer authentication.

Closes #2440 from turboFei/http_auth.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-06-20 10:35:12 +08:00
Xianming Lei
759638e0b1
[CELEBORN-1467] celeborn.worker.storage.dirs should support soft link
### What changes were proposed in this pull request?
`celeborn.worker.storage.dirs` supports soft link

### Why are the changes needed?
NPE will be thrown when StorageManager.createWriter if `celeborn.worker.storage.dirs` contains soft link.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing uts.

Closes #2576 from leixm/CELEBORN-1467.

Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-19 10:44:38 +08:00
binjie yang
565e39ac1b
[CELEBORN-1463][FOLLOWUP] Respeact client/server threads num to avoid competitiveness
### What changes were proposed in this pull request?
 #2570
Respect client/server threads num when create transport client/server memory allocator.

### Why are the changes needed?
Follow up, as title.

Respeact client/server threads num to avoid competitiveness

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No

Closes #2571 from zwangsheng/CELEBORN-1463-follow-up.

Authored-by: binjie yang <yangbinjie@shizhuang-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-18 10:50:36 +08:00
Aravind Patnam
cff2a7daa0
[CELEBORN-1461] Fix Celeborn ipv6 local hostname resolution
### What changes were proposed in this pull request?
Fix ipv6 hostname resolution to have `[]` around the ipv6 address.

### Why are the changes needed?
Previously was getting error:
```
java.lang.AssertionError: assertion failed: Expected hostname or IPv6 IP enclosed in [] but got <ipv6 address>
  	at scala.Predef$.assert(Predef.scala:223)
  	at org.apache.celeborn.common.util.Utils$.checkHost(Utils.scala:429)
  	at org.apache.celeborn.service.deploy.worker.Worker.<init>(Worker.scala:136)
  	at org.apache.celeborn.service.deploy.worker.Worker$.main(Worker.scala:953)
  	at org.apache.celeborn.service.deploy.worker.Worker.main(Worker.scala)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
started the worker and the service starts up now and binds to ipv6 address

Closes #2564 from akpatnam25/CELEBORN-1461.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-17 20:13:45 +08:00
binjie yang
f786390e61
[CELEBORN-1463] Create network memory allocator with celeborn.network.memory.allocator.numArenas
### What changes were proposed in this pull request?
Create netty pooled network memory allocator with `celeborn.network.memory.allocator.numArenas`.

### Why are the changes needed?
Before this, when user call to build non shared pooled memory allocator will use the number of module client threads as the
core number.

Intuitively we should respect to build memory allocator with user define `celeborn.network.memory.allocator.numArenas`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
None

Closes #2570 from zwangsheng/CELEBORN-1463.

Authored-by: binjie yang <yangbinjie@shizhuang-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-17 11:21:07 +08:00
Shuang
0298cfb355 [CELEBORN-1337][FOLLOWUP] Fix compile problem
### What changes were proposed in this pull request?
Fix compile problem as CELEBORN-1337 auto merge conflict with CELEBORN-1444.

### Why are the changes needed?
Fix compile problem as 1337 conflict with CELEBORN-1444.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
PASS GA

Closes #2559 from RexXiong/minor_fix_compile_problem.

Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-06-13 12:17:20 +08:00
Erik.fang
5323c1d009 [CELEBORN-1337] Remove unused fields from HeartbeatFromApplicationResponse
as discussed in https://github.com/apache/celeborn/pull/2398, this PR removed unused fields from HeartbeatFromApplicationResponse, without adding WorkerId Type

Closes #2529 from ErikFang/remove-unused-fields-HeartbeatFromApplicationResponse.

Authored-by: Erik.fang <fmerik@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-06-13 10:15:00 +08:00
xinyuwang1
4f039d5f71 [CELEBORN-1453] Fix the thread safety bug in getMetrics
### What changes were proposed in this pull request?
Fix the thread safety bug in getMetrics of AbstractSource by changing the lock scope

### Why are the changes needed?
When two threads access the getMetrics method in AbstractSource at the same time, one of the threads may get fewer metrics than the actual value, because the actual execution order may be like this: Thread A gets the lock, adds the metrics of the worker source to the innerMetrics queue and releases the lock, Thread B gets the lock, adds the metrics of the worker source to the innerMetrics queue and releases the lock, Thread A gets the lock, adds the metrics of other sources to the innerMetrics queue, assembles the values of innerMetrics, clears innerMetrics and releases the lock, Thread B gets the lock, adds the metrics of other sources to the innerMetrics queue, assembles the values of innerMetrics, clears innerMetrics and releases the lock. The result of this is that Thread A gets two sets of metrics data from the worker source, while Thread B doesn't get any.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manual test

Closes #2548 from littlexyw/get_metrics_fix.

Authored-by: xinyuwang1 <xinyuwang1@xiaohongshu.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-06-08 11:30:11 +08:00
Xianming Lei
999510b265 [CELEBORN-1444] Introduce worker decommission metrics and corresponding REST API
### What changes were proposed in this pull request?

Introduce worker decommission metrics and corresponding REST API.

### Why are the changes needed?

In a production environment, due to certain hardware or environmental reasons, our script will automatically decommission the node. At this time, we need to distinguish between graceful shutdown nodes and decommissioned nodes.

If we distinguish shutdown worker and decommission worker metrics, we can achieve better operation and maintenance.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

- `DefaultMetaSystemSuiteJ#testHandleReportWorkerDecommission`
- `RatisMasterStatusSystemSuiteJ#testHandleReportWorkerDecommission`
- `ApiMasterResourceSuite#decommissionWorkers`
- `ApiWorkerResourceSuite#isDecommissioning`

Closes #2535 from leixm/issue_1444.

Lead-authored-by: Xianming Lei <jerrylei@apache.org>
Co-authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-06-08 11:10:31 +08:00
SteNicholas
7188e845f7
[CELEBORN-1327][FOLLOWUP] Simplify DirectByteBuffer constructor lookup logic
### What changes were proposed in this pull request?

Simplify `DirectByteBuffer` constructor lookup logic in `Platform`. Meanwhile, bump `commons-lang3` version from `3.12.0` to `3.13.0`.

### Why are the changes needed?

`try-catch` statement is not needed because we know version number already.

Backport:

- https://github.com/apache/spark/pull/41780
- https://github.com/apache/spark/pull/42269
- https://github.com/apache/spark/pull/44444

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2544 from SteNicholas/CELEBORN-1327.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-07 16:23:32 +08:00
SteNicholas
a5d3f0f30e
[CELEBORN-1182][FOLLOWUP] WorkerSource should use Counter to support application dimension ActiveConnectionCount metric
### What changes were proposed in this pull request?

`WorkerSource` should use Counter to support application dimension `ActiveConnectionCount` metric.

Follow up #2167.

### Why are the changes needed?

`WorkerSource` uses `Gauge` for application dimension ActiveConnectionCount metric via `appActiveConnections` at present, which has performance problem in metrics REST API as follows:

```
"worker-JettyThreadPool-11242" #11242 daemon prio=5 os_prio=0 tid=0x00007f410800c000 nid=0x2d80 runnable [0x00007f3426de2000]
   java.lang.Thread.State: RUNNABLE
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at scala.collection.TraversableOnce.count(TraversableOnce.scala:118)
	at scala.collection.TraversableOnce.count$(TraversableOnce.scala:116)
	at scala.collection.AbstractTraversable.count(Traversable.scala:108)
	at org.apache.celeborn.service.deploy.worker.WorkerSource.$anonfun$recordAppActiveConnection$1(WorkerSource.scala:104)
	at org.apache.celeborn.service.deploy.worker.WorkerSource$$Lambda$787/1074905995.apply$mcI$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
	at org.apache.celeborn.common.metrics.source.GaugeSupplier$$anon$3.getValue(AbstractSource.scala:466)
	at org.apache.celeborn.common.metrics.source.AbstractSource.recordGauge(AbstractSource.scala:342)
	at org.apache.celeborn.common.metrics.source.AbstractSource.$anonfun$getMetrics$2(AbstractSource.scala:401)
	at org.apache.celeborn.common.metrics.source.AbstractSource.$anonfun$getMetrics$2$adapted(AbstractSource.scala:401)
	at org.apache.celeborn.common.metrics.source.AbstractSource$$Lambda$956/1021547679.apply(Unknown Source)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.celeborn.common.metrics.source.AbstractSource.getMetrics(AbstractSource.scala:401)
	at org.apache.celeborn.common.metrics.sink.AbstractServlet.$anonfun$getMetricsSnapshot$1(AbstractServlet.scala:34)
	at org.apache.celeborn.common.metrics.sink.AbstractServlet$$Lambda$954/1559941228.apply(Unknown Source)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
	at scala.collection.TraversableLike$$Lambda$33/829149076.apply(Unknown Source)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.celeborn.common.metrics.sink.AbstractServlet.getMetricsSnapshot(AbstractServlet.scala:34)
	at org.apache.celeborn.common.metrics.sink.PrometheusServlet.$anonfun$createServletHandler$1(PrometheusServlet.scala:38)
	at org.apache.celeborn.common.metrics.sink.PrometheusServlet$$Lambda$721/2120532393.apply(Unknown Source)
	at org.apache.celeborn.server.common.http.HttpUtils$$anon$1.doGet(HttpUtils.scala:51)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:497)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:584)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel$$Lambda$636/1962809899.dispatch(Unknown Source)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Cluster test.

```
$ curl http://bigdata-rss-worker:9096/metrics|grep ActiveConnectionCount|grep application
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 54003    0 54003    0     0  2924k      0 --:--:-- --:--:-- --:--:-- 3102k
metrics_ActiveConnectionCount_Count{applicationId="application_1688369676084_17462520_1",hostName="bigdata-rss-worker",role="Worker"} 15 1717590356773
metrics_ActiveConnectionCount_Count{applicationId="application_1650016801129_32165809_1",hostName="bigdata-rss-worker",role="Worker"} 7 171759035677
$ curl http://bigdata-rss-worker:9096/metrics|grep ActiveConnectionCount|grep application
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 54025    0 54025    0     0  2891k      0 --:--:-- --:--:-- --:--:-- 2931k
metrics_ActiveConnectionCount_Count{applicationId="application_1688369676084_17462520_1",hostName="bigdata-rss-worker",role="Worker"} 25 1717590431544
metrics_ActiveConnectionCount_Count{applicationId="application_1650016801129_32165809_1",hostName="bigdata-rss-worker",role="Worker"} 14 1717590431544
$ curl http://bigdata-rss-worker:9096/metrics|grep ActiveConnectionCount|grep application
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 54014    0 54014    0     0  2727k      0 --:--:-- --:--:-- --:--:-- 2776k
metrics_ActiveConnectionCount_Count{applicationId="application_1688369676084_17462520_1",hostName="bigdata-rss-worker",role="Worker"} 19 1717590480837
metrics_ActiveConnectionCount_Count{applicationId="application_1650016801129_32165809_1",hostName="bigdata-rss-worker",role="Worker"} 9 1717590480837
```

Closes #2546 from SteNicholas/CELEBORN-1182.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-06 17:25:39 +08:00
SteNicholas
9e5f7e5a4d
[CELEBORN-1449] Fix JavaUtils#deleteRecursivelyUsingJavaIO to skip non-existing file input
### What changes were proposed in this pull request?

Fix `JavaUtils#deleteRecursivelyUsingJavaIO` to skip non-existing file input. Meanwhile, reduce multiple file attribute calls of `JavaUtils#deleteRecursivelyUsingJavaIO`.

### Why are the changes needed?

`deleteRecursivelyUsingJavaIO` is a fallback of `deleteRecursivelyUsingUnixNative` in `JavaUtils`. We should have identical capability for `JavaUtils#deleteRecursivelyUsingJavaIO` which should skip non-existing file input. Meanwhile, `JavaUtils#deleteRecursivelyUsingJavaIO` method performs multiple file attribute calls.

Backport:

- https://github.com/apache/spark/pull/36636
- https://github.com/apache/spark/pull/45346

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2543 from SteNicholas/CELEBORN-1449.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shaoyun Chen <csy@apache.org>
2024-06-05 14:04:36 +08:00
sychen
dc2826a614 [CELEBORN-1448] Use static regex Pattern instances in JavaUtils.timeStringAs and JavaUtils.byteStringAs
### What changes were proposed in this pull request?

### Why are the changes needed?

[SPARK-48496](https://issues.apache.org/jira/browse/SPARK-48496)[CORE] Use static regex Pattern instances in JavaUtils.timeStringAs and JavaUtils.byteStringAs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2541 from cxzl25/CELEBORN-1448.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2024-06-03 23:42:27 +08:00
SteNicholas
450dac8245
[CELEBORN-1447] Support configuring thread number of worker to wait for commit shuffle data files to finish
### What changes were proposed in this pull request?

Introduce `celeborn.worker.commitFiles.wait.threads` to support configuring thread number of worker to wait for commit shuffle data files to finish.

### Why are the changes needed?

`celeborn.worker.commitFiles.threads` supports the configuration that is the thread number of worker to commit shuffle data files asynchronously including waiting for commit files to finish at present. It should support to configure thread number of waiting for commit shuffle data files to finish which avoids the situation where the commit thread pool is waiting for commit files and no thread could commit shuffle data files.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2539 from SteNicholas/CELEBORN-1447.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-06-03 17:47:01 +08:00
SteNicholas
e5f09ce4e0 [CELEBORN-1443] Remove ratis dependencies from common module
### What changes were proposed in this pull request?

Remove ratis dependencies from common module.

### Why are the changes needed?

Ratis is only depended on by the master module. Removing ratis dependencies from the common module reduces the size of the Celeborn client package.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2538 from SteNicholas/CELEBORN-1443.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-06-03 10:15:51 +08:00
SteNicholas
2a57fab869 [CELEBORN-1400] Bump Ratis version from 2.5.1 to 3.0.1
### What changes were proposed in this pull request?

Bump Ratis version from 2.5.1 to 3.0.1. Address incompatible changes:

- RATIS-589. Eliminate buffer copying in SegmentedRaftLogOutputStream.(https://github.com/apache/ratis/pull/964)
- RATIS-1677. Do not auto format RaftStorage in RECOVER.(https://github.com/apache/ratis/pull/718)
- RATIS-1710. Refactor metrics api and implementation to separated modules. (https://github.com/apache/ratis/pull/749)

### Why are the changes needed?

Bump Ratis version from 2.5.1 to 3.0.1. Ratis has released v3.0.0, v3.0.1, which release note refers to [3.0.0](https://ratis.apache.org/post/3.0.0.html), [3.0.1](https://ratis.apache.org/post/3.0.1.html). The 3.0.x version include new features like pluggable metrics and lease read, etc, some improvements and bugfixes including:

- 3.0.0: Change list of ratis 3.0.0 In total, there are roughly 100 commits diffing from 2.5.1 including:
   - Incompatible Changes
      - RaftStorage Auto-Format
      - RATIS-1677. Do not auto format RaftStorage in RECOVER. (https://github.com/apache/ratis/pull/718)
      - RATIS-1694. Fix the compatibility issue of RATIS-1677. (https://github.com/apache/ratis/pull/731)
      - RATIS-1871. Auto format RaftStorage when there is only one directory configured. (https://github.com/apache/ratis/pull/903)
      - Pluggable Ratis-Metrics (RATIS-1688)
      - RATIS-1689. Remove the use of the thirdparty Gauge. (https://github.com/apache/ratis/pull/728)
      - RATIS-1692. Remove the use of the thirdparty Counter. (https://github.com/apache/ratis/pull/732)
      - RATIS-1693. Remove the use of the thirdparty Timer. (https://github.com/apache/ratis/pull/734)
      - RATIS-1703. Move MetricsReporting and JvmMetrics to impl. (https://github.com/apache/ratis/pull/741)
      - RATIS-1704. Fix SuppressWarnings(“VisibilityModifier”) in RatisMetrics. (https://github.com/apache/ratis/pull/742)
      - RATIS-1710. Refactor metrics api and implementation to separated modules. (https://github.com/apache/ratis/pull/749)
      - RATIS-1712. Add a dropwizard 3 implementation of ratis-metrics-api. (https://github.com/apache/ratis/pull/751)
      - RATIS-1391. Update library dropwizard.metrics version to 4.x (https://github.com/apache/ratis/pull/632)
      - RATIS-1601. Use the shaded dropwizard metrics and remove the dependency (https://github.com/apache/ratis/pull/671)
      - Streaming Protocol Change
      - RATIS-1569. Move the asyncRpcApi.sendForward(..) call to the client side. (https://github.com/apache/ratis/pull/635)
   - New Features
      - Leader Lease (RATIS-1864)
      - RATIS-1865. Add leader lease bound ratio configuration (https://github.com/apache/ratis/pull/897)
      - RATIS-1866. Maintain leader lease after AppendEntries (https://github.com/apache/ratis/pull/898)
      - RATIS-1894. Implement ReadOnly based on leader lease (https://github.com/apache/ratis/pull/925)
      - RATIS-1882. Support read-after-write consistency (https://github.com/apache/ratis/pull/913)
      - StateMachine API
      - RATIS-1874. Add notifyLeaderReady function in IStateMachine (https://github.com/apache/ratis/pull/906)
      - RATIS-1897. Make TransactionContext available in DataApi.write(..). (https://github.com/apache/ratis/pull/930)
      - New Configuration Properties
      - RATIS-1862. Add the parameter whether to take Snapshot when stopping to adapt to different services (https://github.com/apache/ratis/pull/896)
      - RATIS-1930. Add a conf for enable/disable majority-add. (https://github.com/apache/ratis/pull/961)
      - RATIS-1918. Introduces parameters that separately control the shutdown of RaftServerProxy by JVMPauseMonitor. (https://github.com/apache/ratis/pull/950)
      - RATIS-1636. Support re-config ratis properties (https://github.com/apache/ratis/pull/800)
      - RATIS-1860. Add ratis-shell cmd to generate a new raft-meta.conf. (https://github.com/apache/ratis/pull/901)
   - Improvements & Bug Fixes
      - Netty
         - RATIS-1898. Netty should use EpollEventLoopGroup by default (https://github.com/apache/ratis/pull/931)
         - RATIS-1899. Use EpollEventLoopGroup for Netty Proxies (https://github.com/apache/ratis/pull/932)
         - RATIS-1921. Shared worker group in WorkerGroupGetter should be closed. (https://github.com/apache/ratis/pull/955)
         - RATIS-1923. Netty: atomic operations require side-effect-free functions. (https://github.com/apache/ratis/pull/956)
      - RaftServer
         - RATIS-1924. Increase the default of raft.server.log.segment.size.max. (https://github.com/apache/ratis/pull/957)
         - RATIS-1892. Unify the lifetime of the RaftServerProxy thread pool (https://github.com/apache/ratis/pull/923)
         - RATIS-1889. NoSuchMethodError: RaftServerMetricsImpl.addNumPendingRequestsGauge https://github.com/apache/ratis/pull/922 (https://github.com/apache/ratis/pull/922)
         - RATIS-761. Handle writeStateMachineData failure in leader. (https://github.com/apache/ratis/pull/927)
         - RATIS-1902. The snapshot index is set incorrectly in InstallSnapshotReplyProto. (https://github.com/apache/ratis/pull/933)
         - RATIS-1912. Fix infinity election when perform membership change. (https://github.com/apache/ratis/pull/954)
         - RATIS-1858. Follower keeps logging first election timeout. (https://github.com/apache/ratis/pull/894)

- 3.0.1:This is a bugfix release. See the [changes between 3.0.0 and 3.0.1](https://github.com/apache/ratis/compare/ratis-3.0.0...ratis-3.0.1) releases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Cluster manual test.

Closes #2480 from SteNicholas/CELEBORN-1400.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-05-30 17:22:22 +08:00
mingji
fd490013ae
Revert "[CELEBORN-1388] Use finer grained locks in changePartitionManager"
This reverts commit 9f304798cb.
2024-05-30 11:18:58 +08:00
Fei Wang
493e0f10cf [CELEBORN-1317][FOLLOWUP] Fix threadDump UT stuck issue
### What changes were proposed in this pull request?

Try to fix ApiWorkerResourceSuite::threadDump UT stuck issue.
1. Using program way to get thread dump.

Related code copied from apache/spark
https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/util/Utils.scala
https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/status/api/v1/api.scala

### Why are the changes needed?
I found that sometimes the UT stuck for threadDump api:
For example: https://github.com/apache/celeborn/actions/runs/8462056188/job/23182806487?pr=2428
<img width="1291" alt="image" src="https://github.com/apache/celeborn/assets/6757692/f39d7bb9-6e31-4ce3-a573-1ff86f335318">

<img width="762" alt="image" src="https://github.com/apache/celeborn/assets/6757692/437592dd-fc9c-404d-a452-834fcf630bd1">

threadDump api UT is new introduced in [CELEBORN-1317](https://issues.apache.org/jira/browse/CELEBORN-1317).

Before there is no UT to cover that, and now it stuck sometimes.

And for getThreadDump, before it leverages processBuilder to get the thread info.

I wonder that the process is stuck because of some unknown reason, so, in this pr, we try to use program way to get thread info.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

UT.

![image](https://github.com/apache/celeborn/assets/6757692/51aaa44e-0523-4b60-b6c8-f4e83c709497)

Closes #2429 from turboFei/thread_dump.

Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-05-27 15:12:50 +08:00
SteNicholas
dd87419044
[CELEBORN-1380][FOLLOWUP] leveldbjni uses org.openlabtesting.leveldbjni to support linux aarch64 platform for leveldb via aarch64 profile
### What changes were proposed in this pull request?

Dependency leveldbjni uses `org.openlabtesting.leveldbjni` to support linux aarch64 platform for leveldb via `aarch64` profile.

Follow up #2476.

### Why are the changes needed?

Celeborn worker could not start on arm arch devices if db backend is `LevelDB`, which should support leveldbjni on the aarch64 platform.

aarch64 uses `org.openlabtesting.leveldbjni:leveldbjni-all.1.8`, and other platforms use `org.fusesource.leveldbjni:leveldbjni-all.1.8`. Meanwhile, because some hadoop dependencies packages are also depend on `org.fusesource.leveldbjni:leveldbjni-all`, but hadoop merge the similar change on trunk, details see
[HADOOP-16614](https://issues.apache.org/jira/browse/HADOOP-16614), therefore it should exclude the dependency of `org.fusesource.leveldbjni` for these hadoop packages related.

In addtion, `org.openlabtesting.leveldbjni` requires glibc version 3.4.21. Otherwise, there will be the following potential runtime risks:

```
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007fad3630b12a, pid=62, tid=0x00007f93394ef700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode linux-amd64 )
# Problematic frame:
# C  [libc.so.6+0x8412a]
#
# Core dump written. Default location: /data/service/celeborn/core or core.62
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007f9308001000):  JavaThread "leveldb" [_thread_in_native, id=878, stack(0x00007f9338cf0000,0x00007f93394f0000)]

siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007f97380d2220
```

Backport:

- https://github.com/apache/spark/pull/26636
- https://github.com/apache/spark/pull/31036

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2530 from SteNicholas/CELEBORN-1380.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-05-27 14:07:02 +08:00
Sanskar Modi
f527b22b4d [CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single ShuffleBlockInfo
### What changes were proposed in this pull request?

Merging smaller `ShuffleBlockInfo` corresponding into same mapID, such that size of each block does not exceeds `celeborn.shuffle.chunk.size`

### Why are the changes needed?
As sorted ShuffleBlocks are contiguous, we can compact multiple `ShuffleBlockInfo` into one as long as the size of compacted one does not exceeds half of `celeborn.shuffle.chunk.size`. This way we can decrease the number of ShuffleBlock objects.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UTs

Closes #2524 from s0nskar/CELEBORN-1410.

Lead-authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-05-23 21:22:45 +08:00
mingji
89d56c9bbc
[CELEBORN-914] Support memory file storage
### What changes were proposed in this pull request?
To support memory file storage.

### Why are the changes needed?
To improve shuffle performance for small shuffle files.

Design doc: https://docs.google.com/document/d/1SM-oOM0JHEIoRHTYhE9PYH60_1D3NMxDR50LZIM7uW0/edit?usp=sharing

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA and manually test on a cluster.

Closes #2300 from FMX/B914.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-05-23 21:05:52 +08:00
Shuang
308eed28c9 [CELEBORN-1427] Add Capacity metrics for Celeborn
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
The Celeborn cluster does not currently provide metrics for 'TotalCapacity' and 'TotalFreeCapacity

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

Closes #2521 from RexXiong/CELEBORN-1427.

Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-05-23 16:06:11 +08:00
Mridul Muralidharan
a13d167617 [CELEBORN-1401] Add SSL support for ratis communication
### What changes were proposed in this pull request?

When SSL is enabled for master, secure the Ratis communication as well with TLS

### Why are the changes needed?

Currently, when TLS is enabled for RPC, Ratis comms still goes in the clear - add support for TLS.
Note that currently this only supports GRPC, and not netty.

### Does this PR introduce _any_ user-facing change?
Secures ratis communication when TLS is enabled at master for rpc.

### How was this patch tested?
Local tests and additional unit tests added

Closes #2515 from mridulm/CELEBORN-1401-add-ratis-ssl-support.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-05-17 17:08:11 +08:00
Shuang
8a10a2d465 [CELEBORN-1421] Refine code in master to reduce unnecessary sync to get workers/lostworkers/shutdownWorkers
### What changes were proposed in this pull request?

1. Use ConcurrentSet to replace ArrayList for workers.
2. Remove unnecessary sync and snapshot when get workers/lostworkers/shutdownWorkers

### Why are the changes needed?

1. Reduce unnecessary sync to get workers/lostworkers/shutdownWorkers.
2. Somewhere in the Master, directly using statusSystem.workers(ArrayList) is not safe, potentially leading to concurrent modification issues.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #2507 from RexXiong/CELEBORN-1421.

Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-05-17 14:06:37 +08:00
mcdull-zhang
8875f20e72 [CELEBORN-1361] MaxInFlightPerWorker should use the value provided by PushStrategy
### What changes were proposed in this pull request?
The data push thread should first send requests to workers that are not under pressure.
Use PushStrategy's `currentMaxReqsInFlight` to better filter requests.

### Why are the changes needed?
Prevent blocking other requests

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2432 from mcdull-zhang/CELEBORN-1361.

Authored-by: mcdull-zhang <work4dong@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-05-16 19:47:04 +08:00
SteNicholas
b8647947c1
[CELEBORN-1430] TransportClientFactory should check whether handler is null when creating client
### What changes were proposed in this pull request?

`TransportClientFactory` checks whether `handler` is null when creating client.

### Why are the changes needed?

There is a case that `cachedClient.isActive()` may return true and may return false when checked for the second time when another thread is closing the channel, which causes that the `handler` may be null. Therefore, `TransportClientFactory` should check whether handler is null when creating client.

Backport https://github.com/apache/spark/pull/46506.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2517 from SteNicholas/CELEBORN-1430.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-05-16 17:57:41 +08:00
Cheng Pan
e66d509a95
[CELEBORN-1369][FOLLOWUP] Improve docs for shuffle fallback policy
### What changes were proposed in this pull request?

Improve docs for shuffle fallback policy

Rename a configuration

```patch
- celeborn.client.spark.shuffle.forceFallback.numPartitionsThreshold
+ celeborn.client.spark.shuffle.fallback.numPartitionsThreshold
````

### Why are the changes needed?

Canonicalize the words to "spark built-in shuffle implementation" everywhere.

And `...forceFallback...` is confusing, use `...fallback...` with explicit docs instead.

### Does this PR introduce _any_ user-facing change?

Deprecate a configuration but still effective.

### How was this patch tested?

Pass CI.

Closes #2494 from pan3793/CELEBORN-1369-followup.

Lead-authored-by: Cheng Pan <chengpan@apache.org>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-05-15 19:18:39 +08:00
mingji
8dd33ceef8 [CELEBORN-1270] Introduce PbPackedPartitionLocations to (de-)serialize PartitionLocations more efficiently
### What changes were proposed in this pull request?
1. Introduces new approaches to (de-)serialize partition locations.
2. The Celeborn server remains compatible with old clients.

### Why are the changes needed?
1. Improve memory efficiency for partition locations.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
1. Pass GA.
2. Run tests on cluster:
```
val start = System.currentTimeMillis
spark.sparkContext.parallelize(1 to 10000, 10000).flatMap( _ => (1 to 950000).iterator.map(num => num)).repartition(10000).count
val after = System.currentTimeMillis
println((after-start)/1000)
```
packed RPC time: 70,65,64,64,64,64
baseline RPC time: 69,66,66,66,67,66

I think this PR does not introduce performance overhead.

4. RPC size test: this PR can reduce PRC size by up to 60%.

Closes #2456 from FMX/CELEBORN-1270.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-05-11 13:50:02 +08:00
Shuang
993d3f2587 [CELEBORN-1398] Support return leader ip to client
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
Currently, if accessing services of a Celeborn cluster across Kubernetes clusters, one may encounter DNS resolution issues. However, connectivity may be achieved through IP addresses when combined with the Kubernetes setting hostNetwork=true for clients from different clusters. At present, the `celeborn.network.bind.preferIpAddress` configuration is only effective on worker nodes. This PR will enable the feature of returning the leader's IP when accessing the master node.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #2489 from RexXiong/CELEBORN-1398.

Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-05-08 15:01:55 +08:00
SteNicholas
1cd231f5e0
[CELEBORN-1412] celeborn.client.rpc.*.askTimeout should fallback to celeborn.rpc.askTimeout
### What changes were proposed in this pull request?

`celeborn.client.rpc.*.askTimeout` should fallback to `celeborn.rpc.askTimeout`.

### Why are the changes needed?

The config option series `celeborn.client.rpc.*.askTimeout` should fallback to `celeborn.rpc.askTimeout` instead of `celeborn.<module>.io.connectionTimeout`, which including `celeborn.client.rpc.getReducerFileGroup.askTimeout`, `celeborn.client.rpc.registerShuffle.askTimeout` and `celeborn.client.rpc.requestPartition.askTimeout`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #2492 from SteNicholas/CELEBORN-1412.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-05-07 13:47:22 +08:00
Mridul Muralidharan
5b8773c39f [CELEBORN-1406] Use Files. getLastModifiedTime to find last modified time instead of file.lastModified
### What changes were proposed in this pull request?

Use `Files.readAttributes` instead of `File.lastModified`.
`File.lastModified` has been observed to have issues in some of our build boxes - which got addressed when moving to `Files.readAttributes`

### Why are the changes needed?

Make reloading more reliable and fix flakey tests

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing unit tests.

Closes #2485 from mridulm/CELEBORN-1406.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
2024-05-06 23:15:02 -05:00
sychen
dc52192163 [CELEBORN-1409] CommitHandler commitFiles RPC supports separate timeout configuration
### What changes were proposed in this pull request?
This PR aims to supports separate timeout configuration at CommitHandler commitFiles RPC.

### Why are the changes needed?
The default value of `celeborn.worker.commitFiles.timeout` is 120s, and the default value of Client's RPC is 60s.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2488 from cxzl25/CELEBORN-1409.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-05-06 17:42:52 +08:00
xinyuwang1
7b1645ff6a [CELEBORN-1369] Support for disable fallback to Spark's default shuffle
### What changes were proposed in this pull request?
An option to disable fallback is provided.

### Why are the changes needed?
It's dangerous to fallback to external shuffle when applications run on both online and offline nodes because online services could be impacted due to a shortage of disk capacity.

### Does this PR introduce _any_ user-facing change?
Yes, fallback to Spark's default shuffle can be disabled by setting `celeborn.client.spark.shuffle.fallback.enabled=false`

### How was this patch tested?
manual test

Closes #2444 from littlexyw/fallback_disable.

Lead-authored-by: xinyuwang1 <xinyuwang1@xiaohongshu.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-05-03 14:32:28 +08:00
CodingCat
9f304798cb [CELEBORN-1388] Use finer grained locks in changePartitionManager
### What changes were proposed in this pull request?

this PR proposes to use finer grained lock in  changePartitionManager when handling requests for different partitions

### Why are the changes needed?

we observed the intensive competition of locks when there are many partition got split. most of  change-partition-executor threads are competing for the concurrenthashmap used in ChangePartitionManager...this concurrentHashMap is holding request per partition but we are lock at the whole map instead of per partition level,

with this change, the driver memory footprint is significantly reduced due to the increased processing throughput...

### Does this PR introduce _any_ user-facing change?

one more configs

### How was this patch tested?

prod

Closes #2462 from CodingCat/finer_grained_locks.

Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-04-30 19:51:07 +08:00
Mridul Muralidharan
29b5586a60 [CELEBORN-1353] Document Celeborn security - authentication and SSL support
### What changes were proposed in this pull request?

User documentation for configuring SSL and authentication

### Why are the changes needed?

Document the new features.

### Does this PR introduce _any_ user-facing change?
Yes, updates documentation

### How was this patch tested?

Used `mkdocs serve` to render and validate the documentation.

Closes #2481 from mridulm/ssl-auth-documentation.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-04-30 14:37:56 +08:00
SteNicholas
9110eab996
[CELEBORN-1380] leveldbjni uses org.openlabtesting.leveldbjni to support linux aarch64 platform for leveldb
### What changes were proposed in this pull request?

Dependency leveldbjni uses `org.openlabtesting.leveldbjni` to support linux aarch64 platform for leveldb.

### Why are the changes needed?

Celeborn worker could not start on arm arch devices if db backend is `LevelDB`, which should support leveldbjni on the aarch64 platform.

aarch64 uses `org.openlabtesting.leveldbjni:leveldbjni-all.1.8`, and other platforms use `org.fusesource.leveldbjni:leveldbjni-all.1.8`. Meanwhile, because some hadoop dependencies packages are also depend on `org.fusesource.leveldbjni:leveldbjni-all`, but hadoop merge the similar change on trunk, details see
[HADOOP-16614](https://issues.apache.org/jira/browse/HADOOP-16614), therefore it should exclude the dependency of `org.fusesource.leveldbjni` for these hadoop packages related.

Backport:

- https://github.com/apache/spark/pull/26636
- https://github.com/apache/spark/pull/31036

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2476 from SteNicholas/CELEBORN-1380.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-04-24 11:52:56 +08:00
Mridul Muralidharan
b7fa30622e [CELEBORN-1354][FOLLOWUP] Split rpc_app into rpc_app_lifecyclemanager and rpc_app_client
### What changes were proposed in this pull request?

Based on [review feedback here](https://github.com/apache/celeborn/pull/2469#discussion_r1573228448), split `rpc_app` module into two internal transport modules - `rpc_app_lifecyclemanager` and `rpc_app_client`.
These are not directly configured by users - but internally allow us to differentiate between lifecycle manager and executor

### Why are the changes needed?

auto ssl is to be configured only at lifecycle manager - doing it at executors is beneign, but not very useful (it results in creation of local files, etc).
Avoid this by handling it specifically only for lifecyclemanager.

This is based on [review feedback](https://github.com/apache/celeborn/pull/2469#discussion_r1573228448).

### Does this PR introduce _any_ user-facing change?
No, the modules are all internal and not user visible.

### How was this patch tested?
Unit tests were updated

Closes #2471 from mridulm/support-app-auto-ssl-followup.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-04-22 19:57:10 +08:00
Mridul Muralidharan
11c7d1080c [CELEBORN-1354] auto ssl for rpc_app transport module
### What changes were proposed in this pull request?

Support over the wire encryption between client side components for `rpc_app` transport module without requiring keystore/truststore to be configured.

### Why are the changes needed?

As detailed in CELEBORN-1354, support over the wire encryption between client side components (`rpc_app` transport module) without needing each application to be configured with certificates.

When enabled, `SSLFactory` will create a self-signed certificate and leverage it to secure the rpc comms.

### Does this PR introduce _any_ user-facing change?
Introduces a configuration to enable/disable this behavior.

### How was this patch tested?
New tests added to validate functionality.

Closes #2469 from mridulm/support-app-auto-ssl.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-04-20 23:17:10 +08:00
SteNicholas
8e4ddaa467
[CELEBORN-1392] TransportClientFactory should regard as zero for negative celeborn.<module>.io.connectTimeout/connectionTimeout
### What changes were proposed in this pull request?

`TransportClientFactory` should regard as zero for negative `celeborn.<module>.io.connectTimeout` and `celeborn.<module>.io.connectionTimeout`.

### Why are the changes needed?

When `celeborn.<module>.io.connectionTimeout` is 0 that means unlimited to netty, `ChannelFuture.await(0)` fails directly and inappropriately. Meanwhile, whhen `celeborn.<module>.io.connectionTimeout` is less than 0 that causes meaningless transport client reconnections and endless reconstructions.

Backport:

- https://github.com/apache/spark/pull/41785
- https://github.com/apache/spark/pull/42619

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`TransportClientFactorySuiteJ#unlimitedConnectAndConnectionTimeouts`

Closes #2467 from SteNicholas/CELEBORN-1392.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-04-19 19:28:02 +08:00
jiang13021
d6fe6f2e33 [CELEBORN-1391] Retry when MasterClient receiving a RpcTimeoutException
### What changes were proposed in this pull request?
Retry when MasterClient receiving a RpcTimeoutException

### Why are the changes needed?
When the MasterClient encounters an RpcTimeoutException, it may indicate that the current master is either busy or unavailable. In such cases, retrying with an alternative master endpoint could work.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test: org.apache.celeborn.common.client.MasterClientSuiteJ#testOneMasterTimeoutInHA

Closes #2466 from jiang13021/celeborn-1391.

Authored-by: jiang13021 <jiangyanze.jyz@antgroup.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-04-19 11:28:03 +08:00
Mridul Muralidharan
f27ede42c4
[CELEBORN-1356] Split rpc module into rpc_app and rpc_service
### What changes were proposed in this pull request?

Split the `rpc` transport module into `rpc_app` and `rpc_service` to allow for them to be independently configured.

### Why are the changes needed?

We need the ability to independently configure communication between application components (driver/executors in spark applications) and those to/from Celeborn service (master/workers) components.

This is particularly relevant for TLS support where applications might be running with TLS disabled for their rpc services or using self-signed certificates (see CELEBORN-1354 for an example), while services would have signed certs.

### Does this PR introduce _any_ user-facing change?

Yes, it allows users to independently configure rpc env within the application and those to/from services.
Backward compatibility is maintained - and so existing `rpc` is the fallback in case `rpc_app` or `rpc_service` config is not found.

### How was this patch tested?

Unit tests were enhanced, existing tests pass.

Closes #2460 from mridulm/split_rpc_module-retry1.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-04-17 14:59:23 +08:00
CodingCat
121395f0f5 [CELEBORN-1314] add capacity-bounded inbox for rpc endpoint
### What changes were proposed in this pull request?

we found a lot of driver OOM issue when dealing with spark applications with super large shuffle,

with the heap dump, we found the inbox of rpc endpoints accumulated tons of change partition location message... even we have increased splitThreshold to 10G, many jobs still have this issue (keep increasing this value will increase the risk of disk overusage of workers)

This PR implements capacity-bounded inbox which is based on a LinkedBlockingQueue with a configured capacity, we found it effectively resolves the problem for us

### Why are the changes needed?

the following screenshots show the main memory consumer in Driver side

<img width="661" alt="image" src="https://github.com/apache/incubator-celeborn/assets/678008/d63196cc-6c3c-4b32-a9db-9871e7cb5fd8">
<img width="723" alt="image" src="https://github.com/apache/incubator-celeborn/assets/678008/64a506c4-03ea-4932-98ba-f8f4923daa6e">

### Does this PR introduce _any_ user-facing change?

no, but two more configurations

### How was this patch tested?

integration tests and unit tests

screenshot showing the application driver memory usage with the patch (blue line)

<img width="766" alt="image" src="https://github.com/apache/incubator-celeborn/assets/678008/86ecaba8-c164-4aef-ad83-cee03238e5da">

screenshot showing the application driver memory usage without patch (brown line)

<img width="799" alt="image" src="https://github.com/apache/incubator-celeborn/assets/678008/a012e0ba-0292-4d25-a7b9-252bdc3cb8cb">

Closes #2366 from CodingCat/memory_bounded_driver.

Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-04-16 10:56:32 +08:00
SteNicholas
3ac769e4fa
[CELEBORN-1236][FOLLOWUP] Gauge is_terminating, is_terminated and is_shutdown should represent a single numerical value
### What changes were proposed in this pull request?

Gauge `is_terminating`, `is_terminated` and `is_shutdown` should represent a single numerical value instead of boolean value.

### Why are the changes needed?

A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. The value type of `is_terminating`, `is_terminated` and `is_shutdown` should be numerical, otherwise `AbstractSource#addGauge` would warn the failed log as follows:

```
2024-04-12 20:04:12,438 [WARN] [main] - org.apache.celeborn.common.metrics.source.ThreadPoolSource -Logging.scala(55) -Add gauge is_terminating failed, the value type class java.lang.Boolean is not a number
2024-04-12 20:04:12,438 [WARN] [main] - org.apache.celeborn.common.metrics.source.ThreadPoolSource -Logging.scala(55) -Add gauge is_terminated failed, the value type class java.lang.Boolean is not a number
2024-04-12 20:04:12,438 [WARN] [main] - org.apache.celeborn.common.metrics.source.ThreadPoolSource -Logging.scala(55) -Add gauge is_shutdown failed, the value type class java.lang.Boolean is not a number
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #2457 from SteNicholas/CELEBORN-1236.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-04-15 11:34:34 +08:00
SteNicholas
1d3558bd14 [CELEBORN-1385] HttpServer support idle timeout configuration of Jetty
### What changes were proposed in this pull request?

Introduce `celeborn.master.http.idleTimeout` and `celeborn.worker.http.idleTimeout` to support idle timeout configuration of Jetty for `HttpServer`.

### Why are the changes needed?

`ServerConnector` supports HTTP idle timeout configuration via `jetty.http.idleTimeout`, of which default value is 30000ms that is configured as `jetty.http.idleTimeout=300000`. `HttpServer` should also support idle timeout configuration of Jetty, which timeout is as follows:

```
2024-04-12 16:04:00,926 [DEBUG] [master-JettyScheduler-1] - org.eclipse.jetty.io.IdleTimeout -IdleTimeout.java(161) -SocketChannelEndPoint567d3f82{l=/127.0.0.1:9097,r=/127.0.0.1:35276,OPEN,fill=FI,flush=-,to=29999/30000}{io=1/1,kio=1,kro=1}->HttpConnection2f88da0c[p=HttpParser{s=START,0 of -1},g=HttpGenerator796c3666{s=START}]=>HttpChannelOverHttp63815646{s=HttpChannelState5c192497{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=5,c=false/false,a=IDLE,uri=null,age=0} idle timeout check, elapsed: 29999 ms, remaining: 1 ms
2024-04-12 16:04:00,927 [DEBUG] [master-JettyScheduler-1] - org.eclipse.jetty.io.IdleTimeout -IdleTimeout.java(161) -SocketChannelEndPoint567d3f82{l=/127.0.0.1:9097,r=/127.0.0.1:35276,OPEN,fill=FI,flush=-,to=30001/30000}{io=1/1,kio=1,kro=1}->HttpConnection2f88da0c[p=HttpParser{s=START,0 of -1},g=HttpGenerator796c3666{s=START}]=>HttpChannelOverHttp63815646{s=HttpChannelState5c192497{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=5,c=false/false,a=IDLE,uri=null,age=0} idle timeout check, elapsed: 30001 ms, remaining: -1 ms
2024-04-12 16:04:00,927 [DEBUG] [master-JettyScheduler-1] - org.eclipse.jetty.io.IdleTimeout -IdleTimeout.java(168) -SocketChannelEndPoint567d3f82{l=/127.0.0.1:9097,r=/127.0.0.1:35276,OPEN,fill=FI,flush=-,to=30001/30000}{io=1/1,kio=1,kro=1}->HttpConnection2f88da0c[p=HttpParser{s=START,0 of -1},g=HttpGenerator796c3666{s=START}]=>HttpChannelOverHttp63815646{s=HttpChannelState5c192497{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=5,c=false/false,a=IDLE,uri=null,age=0} idle timeout expired
2024-04-12 16:04:00,927 [DEBUG] [master-JettyScheduler-1] - org.eclipse.jetty.io.FillInterest -FillInterest.java(136) -onFail FillInterest6cc48840{AC.ReadCB2f88da0c{HttpConnection2f88da0c::SocketChannelEndPoint567d3f82{l=/127.0.0.1:9097,r=/127.0.0.1:35276,OPEN,fill=FI,flush=-,to=30001/30000}{io=1/1,kio=1,kro=1}->HttpConnection2f88da0c[p=HttpParser{s=START,0 of -1},g=HttpGenerator796c3666{s=START}]=>HttpChannelOverHttp63815646{s=HttpChannelState5c192497{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=5,c=false/false,a=IDLE,uri=null,age=0}}}
java.util.concurrent.TimeoutException: Idle timeout expired: 30001/30000 ms
    at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) ~[jetty-io-9.4.52.v20230823.jar:9.4.52.v20230823]
    at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) ~[jetty-io-9.4.52.v20230823.jar:9.4.52.v20230823]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_162]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_162]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_162]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_162]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_162]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_162]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_162]
2024-04-12 16:04:00,927 [DEBUG] [master-JettyScheduler-1] - org.eclipse.jetty.http.HttpParser -HttpParser.java(1883) -close HttpParser{s=START,0 of -1}
2024-04-12 16:04:00,927 [DEBUG] [master-JettyScheduler-1] - org.eclipse.jetty.http.HttpParser -HttpParser.java(1912) -START --> CLOSE
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2455 from SteNicholas/CELEBORN-1385.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-04-14 12:40:57 +08:00
Angerszhuuuu
b65b5433dc
[CELEBORN-1376] Push data failed should always release request body
### What changes were proposed in this pull request?
Worker netty not release
<img width="1729" alt="截屏2024-04-07 17 26 40" src="https://github.com/apache/celeborn/assets/46485123/5774f735-570b-448e-ab94-4c78661717f5">

Many push failed
<img width="767" alt="截屏2024-04-07 17 27 46" src="https://github.com/apache/celeborn/assets/46485123/41866bd0-d634-4dbf-8518-b474c8d1faad">

1. For spark shuffle client, enable it release push data body when rpc failure
2. For flink client, since it use wrapped bytbuf, we need release push data body when rpc failure and release origin body when rpc completed.
3. For worker replicate, we should enable it release push data body when rpc failure.

### Why are the changes needed?
Avoid worker netty memory leak

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2449 from AngersZhuuuu/CELEBORN-1376.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-04-10 19:42:14 +08:00
Aravind Patnam
f04ebccd4d
[CELEBORN-1368] Log celeborn config for debugging purposes
### What changes were proposed in this pull request?
Log celeborn config for debugging purposes.

### Why are the changes needed?
Help with debugging

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
tested the patch internally.

Closes #2442 from akpatnam25/CELEBORN-1368.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-04-08 15:11:35 +08:00
CodingCat
c788c38025
[CELEBORN-1328] Introduce ActiveSlotsCount metric to monitor the number of active slots
### What changes were proposed in this pull request?

Introduce `ActiveSlots` metric to represent the disk resource demand currently in the cluster.

### Why are the changes needed?

It's recommended to introduce `ActiveSlots` metric to represent the disk resource demand currently in the cluster.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

In our test cluster (we can see the value of activeSlots increases and then back to 0 after the application finished, and slotsAllocated is increasing all the way).

![image](https://github.com/apache/incubator-celeborn/assets/678008/c05aa763-11ad-4bbd-9ae0-dd6a9cb01ac5)

Closes #2386 from CodingCat/slots_decrease.

Lead-authored-by: CodingCat <zhunansjtu@gmail.com>
Co-authored-by: Nan Zhu <CodingCat@users.noreply.github.com>
Co-authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-04-08 11:08:05 +08:00
Mridul Muralidharan
b1f8ec8357 [CELEBORN-1351] Introduce SSLFactory and enable TLS support
### What changes were proposed in this pull request?

Add SSLFactory, and wire up TLS support with rest of Celeborn to enable secure over the wire communication.

### Why are the changes needed?
Add support for TLS to secure wire communication.
This is the last PR to add basic support for TLS.
There will be a follow up for CELEBORN-1356 and documentation ofcourse !

### Does this PR introduce _any_ user-facing change?
Yes, completes basic support for TLS in Celeborn.

### How was this patch tested?
Existing tests, augmented with additional unit tests.

Closes #2438 from mridulm/add-sslfactory-and-related-changes.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-04-08 10:42:29 +08:00
Chandni Singh
0d72c95958 [CELEBORN-1365] Ensure that a client cannot update the metadata belonging to a different application
### What changes were proposed in this pull request?
This ensures that an authenticated client does not update the metadata belonging to another application.

### Why are the changes needed?
The changes are needed for authentication support.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #2441 from otterc/CELEBORN-1365.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-04-08 10:35:13 +08:00
Mridul Muralidharan
186899f53f
[CELEBORN-1371] Update ratis with internal port endpoint address as well (#2446)
* Update ratis with internal port endpoint address as well, and propagate it to workers, while keeping existing path for applications the same
---------

Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
2024-04-05 14:42:03 -04:00
Mridul Muralidharan
5065dbed59
[CELEBORN-1372] Update ControlMessages to handle ApplicationMeta and ApplicationMetaRequest
### What changes were proposed in this pull request?

With authentication enabled, rpc requests from executors to driver and vice versa fail when trying to send/receive `ApplicationMeta` and `ApplicationMetaRequest`.

### Why are the changes needed?

Executor is unable to communicate with the driver with auth enabled.

### Does this PR introduce _any_ user-facing change?

No, bug fix for feature introduced in 0.5

### How was this patch tested?

Tested with spark 3.1 against a patched version of Celeborn (pre-reqs: #2445, #2446)

Closes #2447 from mridulm/CELEBORN-1372-fix-ApplicationMeta-serde.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-04-05 08:49:32 +08:00
Mridul Muralidharan
f63adc51e4
[CELEBORN-1370] Exception with authentication is enabled when creating send-application-meta thread pool
### What changes were proposed in this pull request?

Change the initialization order, so that `sendApplicationMetaThreads` has been initialized before the dispatcher initalizes for master.
Currently it ends up being `0` as `onStart` ends up getting called as part of object creation - before `sendApplicationMetaThreads` has been initialized (and so ends up with default value of `0`).

### Why are the changes needed?

Ensure `sendApplicationMetaExecutor` is created when auth is enabled, and rest of `Master.onStart` completes.

### Does this PR introduce _any_ user-facing change?

No, fixes a bug in master.

### How was this patch tested?

Local deployment, existing unit tests.

Closes #2445 from mridulm/CELEBORN-1370.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-04-05 08:47:58 +08:00
Chandni Singh
6153ba4c62 [CELEBORN-1360] Ensure that a client cannot push or fetch data belonging to a different application
### What changes were proposed in this pull request?
This ensures that an authenticated client is not trying to push or fetch data which belongs to another application.

### Why are the changes needed?
The changes are needed for authentication support.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #2431 from otterc/CELEBORN-1360.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-04-03 09:56:40 +08:00
Mridul Muralidharan
3ff8812cdd [CELEBORN-1348] Update infrastructure for SSL communication
### What changes were proposed in this pull request?

Update infrastructure for SSL support.
Please see #2416 for the consolidated PR with all the changes for reference.

### Why are the changes needed?

At a high level, the changes are:
* `ManagedBuffer.convertToNettyForSsl`, to support SSL encryption.
* Add `EncryptedMessageWithHeader`, which is used to wrap the message and body, for use with SSL.
* `SslMessageEncoder`  is an encoder for SSL

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

The overall PR #2416 (and this PR as well) passes all tests, and this PR includes relevant subset of tests.

Closes #2427 from mridulm/update-infra-for-ssl.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-04-01 19:59:44 +08:00
SteNicholas
82022a9427
[CELEBORN-1362] Remove unnecessary configuration celeborn.client.flink.inputGate.minMemory and celeborn.client.flink.resultPartition.minMemory
### What changes were proposed in this pull request?

Remove unnecessary configuration `celeborn.client.flink.inputGate.minMemory` and `celeborn.client.flink.resultPartition.minMemory`.

### Why are the changes needed?

`celeborn.client.flink.inputGate.minMemory` and `celeborn.client.flink.resultPartition.minMemory` are configured as min memory reserved at present. Meanwhile, `celeborn.client.flink.inputGate.memory` should be at least `networkBufferSize * MIN_BUFFERS_PER_GATE` bytes, and `celeborn.client.flink.resultPartition.memory` should be at least `networkBufferSize * MIN_BUFFERS_PER_PARTITION` bytes. Therefore, `celeborn.client.flink.inputGate.minMemory` and `celeborn.client.flink.resultPartition.minMemory` are unnecessary configuration for `celeborn.client.flink.inputGate.memory` and `celeborn.client.flink.resultPartition.memory`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`PluginSideConfSuiteJ#testCoalesce`

Closes #2433 from SteNicholas/CELEBORN-1362.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-04-01 11:15:14 +08:00
SteNicholas
ff2bc92067 [CELEBORN-1317][FOLLOWUP] Update default value of celeborn.master.http.maxWorkerThreads and celeborn.worker.http.maxWorkerThreads via QueuedThreadPool
### What changes were proposed in this pull request?

Update default value of `celeborn.master.http.maxWorkerThreads` and `celeborn.worker.http.maxWorkerThreads` via `QueuedThreadPool`, of which default value is 200.

### Why are the changes needed?

`QueuedThreadPool` determines that the default minimum threads is 8, and the default maximum threads is 200 in [QueuedThreadPool#L121](48f6ab7289/jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java (L1210)) and [QueuedThreadPool#L125](48f6ab7289/jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java (L125)).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2428 from SteNicholas/CELEBORN-1317.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-29 11:56:04 +08:00
SteNicholas
6fdeced158 [CELEBORN-1359] Support Netty Logging at the network layer
### What changes were proposed in this pull request?

Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. `NettyLogger` is introduced as a new class which is able to construct a log handler depending on the log level:

- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">`: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient.
- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">`: Netty's own log handler is used which dumps the message contents.
- Otherwise (when the logger is not `TRACE` or `DEBUG`) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect).

Backport:

- [[SPARK-36719][CORE] Supporting Netty Logging at the network layer](https://github.com/apache/spark/pull/33962)
- [[SPARK-45377][CORE] Handle InputStream in NettyLogger](https://github.com/apache/spark/pull/43165)

### Why are the changes needed?

This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local manually test.

Closes #2423 from SteNicholas/CELEBORN-1359.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-28 16:11:37 +08:00
Fei Wang
adbc77cd4f [CELEBORN-1317] Refine celeborn http server and support swagger ui
### What changes were proposed in this pull request?

Before, there is no http request spec likes query param, http method and response mediaType.
And for each api, a HttpEndpoint class is needed.

In this PR, we refine the code for http service and provide swagger ui.

Note that: This pr does not change the orignal api request and response behavior, including metrics APIs.

TODO:
1. define DTO
2. http request authentication

<img width="1900" alt="image" src="https://github.com/apache/incubator-celeborn/assets/6757692/7f8c2363-170d-4bdf-b2c9-74260e31d3e5">

<img width="1138" alt="image" src="https://github.com/apache/incubator-celeborn/assets/6757692/3ae6ec8e-00a8-475b-bb37-0329536185f6">

### Why are the changes needed?

To close CELEBORN-1317

### Does this PR introduce _any_ user-facing change?

The api is align with before.

### How was this patch tested?
UT.

Closes #2371 from turboFei/jetty.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-27 23:18:18 +08:00
Mridul Muralidharan
b14254be9a
[CELEBORN-1349] Add SSL related configs and support for ReloadingX509TrustManager
Add SSL related configs and support for `ReloadingX509TrustManager`, required for enabling SSL support.
Please see #2416 for the consolidated PR with all the changes for reference.

Introduces SSL related configs for enabling and configuring use of TLS.

Yes, introduces configs to control behavior of SSL

The overall PR #2411 (and this PR as well) passes all tests, this is specifically pulling out the `ReloadingX509TrustManager` and config related changes

Closes #2419 from mridulm/config-for-ssl.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-27 18:21:14 +08:00
Mridul Muralidharan
4400089708
[CELEBORN-1346] Add build changes and test resources for ssl support
### What changes were proposed in this pull request?

Build changes and test resources for enabling SSL support.
Please see #2416 for the consolidate PR with all the changes for reference.

Note: I closed the older PR #2413 and reopened this one give the repo changes.

### Why are the changes needed?

Build dependency updates and addition of test resources for use with tests.
The specific tests leveraging these will be added in subsequent jiras linked off of CELEBORN-1343
Splitting it up into multiple PR's to reduce the review load.

### Does this PR introduce _any_ user-facing change?

io.netty:netty-tcnative-boringssl-static is an additional dependency.
org.bouncycastle:* are test dependencies which should have no user facing changes.

### How was this patch tested?
The overall PR #2411 passes all tests, this is specifically pulling out the dependency changes and resources.

Closes #2417 from mridulm/build-and-test-for-tls.

Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-03-26 21:50:54 +08:00
zky.zhoukeyong
fc238005bd [CELEBORN-1144] Batch OpenStream RPCs
### What changes were proposed in this pull request?
Batch OpenStream RPCs by Worker to avoid too many RPCs.

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Passes GA and Manual tests.

Closes #2362 from waitinfuture/1144.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-25 16:25:05 +08:00
SteNicholas
e29f013e3a [CELEBORN-1357] AbstractRemoteShuffleResultPartitionFactory should remove the check of shuffle compression codec
### What changes were proposed in this pull request?

`AbstractRemoteShuffleResultPartitionFactory` removes the check of shuffle compression codec.

### Why are the changes needed?

`AbstractRemoteShuffleResultPartitionFactory` checks whether shuffle compression codec is LZ4 for Flink 1.14 and 1.15 version at present. Meanwhile, since Flink 1.17 version, ZSTD has already been supported. Therefore `AbstractRemoteShuffleResultPartitionFactory` should remove the check of shuffle compression codec for Flink 1.17 version and above, which is checked via the constructor of `BufferCompressor`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- `RemoteShuffleResultPartitionFactorySuiteJ`

Closes #2414 from SteNicholas/CELEBORN-1357.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-25 15:44:45 +08:00
lvshuang.xjs
9497d557e6
[CELEBORN-1345] Add a limit to the master's estimated partition size
### What changes were proposed in this pull request?
Currently, the Celeborn master calculates the estimatedPartitionSize based on the fileInfo committed by the application. This estimate is then used to allocate slots across all workers. However, this partition size may be too large or too small for Celeborn. For example, if an application commits a single file of 1TB to only one worker, using that partition size could result in all other workers having no available slots or only very small slots. To improve this, it would be better to implement a cap on the master's estimated partition size to prevent such imbalances.

### Why are the changes needed?
As title

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #2412 from RexXiong/CELEBORN-1345.

Lead-authored-by: lvshuang.xjs <lvshuang.xjs@taobao.com>
Co-authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-03-25 14:40:47 +08:00
SteNicholas
73cf1562f7 [CELEBORN-1299] Introduce JVM profiling in Celeborn Worker using async-profiler
### What changes were proposed in this pull request?

Introduce JVM profiling `JVMProfier` in Celeborn Worker using async-profiler to capture CPU and memory profiles.

### Why are the changes needed?

[async-profiler](https://github.com/async-profiler) is a sampling profiler for any JDK based on the HotSpot JVM that does not suffer from Safepoint bias problem. It has low overhead and doesn’t rely on JVMTI. It avoids the safepoint bias problem by using the `AsyncGetCallTrace` API provided by HotSpot JVM to profile the Java code paths, and Linux’s perf_events to profile the native code paths. It features HotSpot-specific APIs to collect stack traces and to track memory allocations.
The feature introduces a profier plugin that does not add any overhead unless enabled and can be configured to accept profiler arguments as a configuration parameter. It should support to turn profiling on/off, includes the jar/binaries needed for profiling.

Backport [[SPARK-46094] Support Executor JVM Profiling](https://github.com/apache/spark/pull/44021).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Worker cluster test.

Closes #2409 from SteNicholas/CELEBORN-1299.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-25 14:05:50 +08:00
CodingCat
c6c319d865 [CELEBORN-1309][FOLLOWUP] Cap the max memory can be used for sort buffer
### What changes were proposed in this pull request?

add a new parameter to cap the max memory can be used for sort writer buffer

### Why are the changes needed?

with a huge number of partitions, the threshold based on buffer size * number of partitions without this cap can be too large, e.g. 64K * 100000 = 6G

### Does this PR introduce _any_ user-facing change?

a new parameter

### How was this patch tested?

ut

Closes #2388 from CodingCat/adaptive_followup.

Lead-authored-by: CodingCat <zhunansjtu@gmail.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-25 12:08:04 +08:00
SteNicholas
d62f75fdc7 [MINOR] Unifiy license format of pom.xml
### What changes were proposed in this pull request?

Unifiy license format of `pom.xml`.

### Why are the changes needed?

There are different license formats among modules, which standard license format has indent before `~`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2408 from SteNicholas/maven-license-format.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-21 14:34:49 +08:00
Mridul Muralidharan
21d5698a90 [CELEBORN-1339] Mark connection as timedOut in TransportClient.close
### What changes were proposed in this pull request?

Importing details from https://github.com/apache/spark/pull/43162:

--
This PR avoids a race condition where a connection which is in the process of being closed could be returned by the TransportClientFactory only to be immediately closed and cause errors upon use.

This race condition is rare and not easily triggered, but with the upcoming changes to introduce SSL connection support, connection closing can take just a slight bit longer and it's much easier to trigger this issue.

Looking at the history of the code I believe this was an oversight in https://github.com/apache/spark/pull/9853.

--

### Why are the changes needed?

We are working towards adding TLS support, which is essentially based on Spark 4.0 TLS support, and this is one of the fixes from there.
(I am yet to file the overall TLS support jira yet, but this is enabling work).

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

Closes #2400 from mridulm/add-SPARK-45375.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-03-20 08:50:14 +08:00
sychen
9938143736 [MINOR] Fix typo in TransportClient
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2406 from cxzl25/TransportClient_typo.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-03-20 08:41:59 +08:00
sychen
91f6378682 [CELEBORN-1336] Remove client partition split pool
### What changes were proposed in this pull request?

### Why are the changes needed?
`CELEBORN-1320` uses `ReviveManager` to batch processing SOFT_SPLIT event RPC, so `partitionSplitPool` is no longer used, and the configuration item `celeborn.client.push.splitPartition.threads` is meaningless.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2396 from cxzl25/CELEBORN-1336.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-03-18 21:48:59 +08:00
SteNicholas
d33ac28945 [MINOR] Fix typo of celeborn.network.bind.preferIpAddress doc
### What changes were proposed in this pull request?

Fix typo of `celeborn.network.bind.preferIpAddress` doc from `ture` to `true`.

### Why are the changes needed?

`celeborn.network.bind.preferIpAddress` doc has typo for `ture`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2392 from SteNicholas/prefer-ip-address.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-03-14 17:12:33 +08:00
mingji
5043f46aa7 [CELEBORN-863][FOLLOWUP] Fix persisted committed file infos lost
### What changes were proposed in this pull request?
To fix a bug that might cause persisted committed file info lost.

### Why are the changes needed?
A worker starts will clean its persisted committed file info and won't put back if this worker restart again, the committed file infos will lost.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA.

Closes #2390 from FMX/b863-1.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-03-14 13:51:10 +08:00
Curtis Howard
23269d77e9
[CELEBORN-1327] Support Spark 3.5 with JDK21
### What changes were proposed in this pull request?
Updates Celeborn to account for [JDK-8303083](https://bugs.openjdk.org/browse/JDK-8303083), which affects JDK21.  (similar to the Apache Spark change here:
https://github.com/apache/spark/pull/39909)

### Why are the changes needed?
Without this change, Spark users of Celeborn will encounter a runtime error similar to the following:
`Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long, int) [in thread "Executor task launch worker for task 0.0 in stage 0.0 (TID 0)"]
        at org.apache.celeborn.common.unsafe.Platform.<clinit>(Platform.java:135)
        ... 16 more`

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tested using standalone Spark 3.5.1 (Hadoop 3.2) and the Celeborn `main` branch with JDK21 build changes in https://github.com/apache/incubator-celeborn/pull/2385.  Reproduced the runtime error above and confirmed the patch resolves it.

Closes #2387 from curtishoward/CELEBORN-1327.

Authored-by: Curtis Howard <curtis@curtiss-mbp.lan>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-03-13 23:08:09 +08:00
CodingCat
2e251457f3 [CELEBORN-1309] Support adaptive management of memory threshold for SortBasedWriter
### What changes were proposed in this pull request?

while SortBasedWriter has less memory footprint than HashBasedWriter, it suffers from performance issue when we have  many partitions and the write buffer is filled with small chunks of data quickly

 for example, if sort buffer size is 32K, you have 4 partitions and 128K data in total, the data distribution is like partition A, B, C, D, each time it comes with 8K per partition.... in this case, you need to compress and send small 8K chunk 4 times per partition , the cost would become very high. If you use hashbasedwriter, it doesn't have this problem since the push only happens when the per-partition buffer is full. Of course , larger sort buffer size can mitigate the issue, but tuning sort buffer size per job is a tedious work

this PR introduces a new feature that we measure total size of pushed bytes and pushed count as well as the "should-pushed" bytes and counts (should-push means that , the data we pushed is larger than CLIENT_PUSH_BUFFER_MAX_SIZE (in another word, we will trigger a push even with hashbasedwriter in this case))

when actualPushedBytes/actualPushedCounts > (1 + Threshold) * (ShouldPushBytes/ShouldPushCounts), we will enlarge the sort buffer size by 1X to try to buffer more data before pushing  (the max size of sortBuffer would be capped at # of partitions * CLIENT_PUSH_BUFFER_MAX_SIZE)

### Why are the changes needed?

to reduce perf cost in sortbased writer

### Does this PR introduce _any_ user-facing change?

no, but have 2 extra configurations

### How was this patch tested?

in prod of our company and also unit test

Closes #2358 from CodingCat/adaptive_memory_threshold.

Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-13 13:54:12 +08:00
Aravind Patnam
ca64bb5143 [CELEBORN-1313] Custom Network Location Aware Replication
### What changes were proposed in this pull request?

Enable custom network location aware replication, based on a custom impl of `DNSToSwitchMapping`.

### Why are the changes needed?

Resolution of network location of multiple workers at master can be expensive at times. This way, each worker resolves its own network location and sends to master via the RegisterWorker transport message. If worker cannot resolve, fallback to attempting to resolve at master (during update meta or reload of snapshot). Proposal: [Celeborn Custom Network Location Aware Replication](https://docs.google.com/document/d/11M_MKKnIXCTExJHMX-OMTq7SBpkl8fJMlpy8hLgmev0/edit#heading=h.s3vnydz589z5)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Updated the unit tests.

Closes #2367 from akpatnam25/CELEBORN-1313.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-13 11:10:30 +08:00
Fei Wang
597f594f93 [CELEBORN-1324][MINOR] Remove unused PrometheusSink class
### What changes were proposed in this pull request?

PrometheusSink is not used.

### Why are the changes needed?

Close CELEBORN-1324

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Not needed.

Closes #2381 from turboFei/remove_unused.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-03-12 14:07:12 +08:00
Chandni Singh
96df0d6e3c [CELEBORN-1179] Add support in Celeborn Workers to fetch application meta from the Master
### What changes were proposed in this pull request?
This enables a Celeborn Worker to retrieve the application meta from the Master if it hasn't received the secret from the Master before the application attempts to connect to it. Additionally, the Celeborn Worker's SecretRegistry has been converted into an LRU cache to prevent unbounded growth of the registry.

### Why are the changes needed?
This is last change needed for Auth support in Celeborn (https://issues.apache.org/jira/browse/CELEBORN-1011)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UTs and part of a bigger change which will be tested end-to-end.

Closes #2363 from otterc/CELEBORN-1179.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-11 14:15:08 +08:00
sychen
b3eed34b57
[CELEBORN-1293] Output received signals at master and worker
### What changes were proposed in this pull request?
When we shut down the master or worker, we can output the signal as a record.

### Why are the changes needed?
Conveniently track the status of master and workers.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

```bash
./sbin/stop-all.sh
```

```
12:20:59.932 [SIGTERM handler] ERROR org.apache.celeborn.service.deploy.master.Master - RECEIVED SIGNAL TERM
```

```
12:20:59.563 [SIGTERM handler] ERROR org.apache.celeborn.service.deploy.worker.Worker - RECEIVED SIGNAL TERM
```

Closes #2334 from cxzl25/CELEBORN-1293.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-03-08 15:48:57 +08:00
Chandni Singh
835437f0b9 [CELEBORN-1261] Add auth support to client
### What changes were proposed in this pull request?
This enables client to push and fetch shuffle data securely to Celeborn Workers.

### Why are the changes needed?
This change is required for adding authentication. ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
It is part of bigger change which will be tested end to end.

Closes #2360 from otterc/CELEBORN-1261.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-07 09:54:28 +08:00
Chandni Singh
6897b8be99 [CELEBORN-1234] Master should persist the application meta in Ratis and push it to the Workers
### What changes were proposed in this pull request?
This enables Celeborn Master to persist application meta in Ratis and also push it to Celeborn Workers when it receives the requests for slots from the LifecycleManager.

### Why are the changes needed?
This change is required for adding authentication. ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added some UTs.

Closes #2346 from otterc/CELEBORN-1234.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-03-06 17:02:04 +08:00
zky.zhoukeyong
cae4de1cc1 [CELEBORN-1301] Catch and throw FetchFailedException in CelebornInputStream#fillBuffer
### What changes were proposed in this pull request?
Catch and throw FetchFailedException in CelebornInputStream#fillBuffer to enable spark's stage rerun
when fillBuffer encounters fetch chunk exception

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2349 from waitinfuture/1301.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-03-03 13:32:40 +08:00
Chandni Singh
d5a1bcdb6d [CELEBORN-1256] Added internal port and auth support to Celeborn worker
### What changes were proposed in this pull request?

This adds an internal port and auth support to Celeborn Wokers.
1. Internal port is used by a worker to receive messages from Celeborn Master.
2. Authentication support for secure communication with clients. This change doesn't add the support in clients to communicate to the Workers securely. That will be in a future change.

This change targets just adding the port and auth support to Worker. The following items from the proposal are still pending:

- Persisting the app secrets in Ratis.
- Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
- Secured communication between workers and clients.

### Why are the changes needed?
It is needed for adding authentication support to Celeborn ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011))

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Part of a bigger change. For this change, only modified existing UTs.

Closes #2292 from otterc/CELEBORN-1256.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
2024-02-29 10:09:22 +08:00
SteNicholas
aecae8161b [CELEBORN-1239][FOLLOWUP] Deprecate celeborn.quota.configuration.path config
### What changes were proposed in this pull request?

Deprecate `celeborn.quota.configuration.path` config. User `celeborn.dynamicConfig.store.fs.path` instead.

### Why are the changes needed?

`DefaultQuotaManager` is removed in #2298, which causes that `celeborn.quota.configuration.path` is useless. `celeborn.quota.configuration.path` could be deprecated that uses `celeborn.dynamicConfig.store.fs.path` to config quota.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2339 from SteNicholas/CELEBORN-1239.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-02-27 22:58:25 +08:00
SteNicholas
2dd0a1df4a [CELEBORN-1296] Introduce celeborn.dynamicConfig.store.fs.path config to configure the path of dynamic config file for fs store backend
### What changes were proposed in this pull request?

Introduce `celeborn.dynamicConfig.store.fs.path` config to configure the path of dynamic config file for fs store backend.

### Why are the changes needed?

`FsConfigServiceImpl` uses `celeborn.quota.configuration.path` to configure the path of dynamic config file for fs store backend at present. The path of dynamic config file should be introduced with `celeborn.dynamicConfig.store.fs.path` instead of quota configuration path.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2337 from SteNicholas/CELEBORN-1296.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-02-27 19:17:13 +08:00
sychen
f4bc87e6db
[CELEBORN-1288] Prompt configuration items when receiving IdleStateEvent
### What changes were proposed in this pull request?
When an `IdleStateEvent ` is received, the configuration items of the corresponding module are output.

### Why are the changes needed?
Now that the `IdleStateEvent` event is received, only the timeout time is output, but the corresponding configuration items are not output.

```
24/02/26 04:12:08,062 [data-client-5-8] ERROR TransportChannelHandler: Connection to /XXX:YYY has been quiet for 240000 ms while there are outstanding requests.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2329 from cxzl25/CELEBORN-1288.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-02-26 23:19:38 +08:00
liangyongyuan
4ddc91afda [CELEBRON-1282] Optimize push data replica error message
### What changes were proposed in this pull request?
Optimize the handling of exceptions during the push of replica data, now only throwing PUSH_DATA_CONNECTION_EXCEPTION_REPLICA in specific scenarios.

### Why are the changes needed?
When handling exceptions related to pushing replica data in the worker, unmatched exceptions, such as 'file already closed,' are uniformly transformed into REPLICATE_DATA_CONNECTION_EXCEPTION_COUNT and returned to the client. The client then excludes the peer node based on this count, which may not be appropriate in certain scenarios. For instance, in the case of an exception like 'file already closed,' it typically occurs during multiple splits and commitFile operations. Excluding a large number of nodes under such circumstances is clearly not in line with expectations.
![image](https://github.com/apache/incubator-celeborn/assets/46274164/816d21ad-1f79-45f0-bbe7-e93e15389edd)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
through exist uts

Closes #2323 from lyy-pineapple/CELEBORN-1282.

Authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
2024-02-26 12:55:26 +08:00
Angerszhuuuu
7b0211e345 [CELEBORN-1277] Add celeborn.quota.enabled at Master and Client side to enable checking quota
### What changes were proposed in this pull request?

Add `celeborn.quota.enabled` at Master and Client side to enable checking quota

### Why are the changes needed?

`celeborn.quota.enabled` should be added in Master and Client side to enable quota check for Celeborn Master and Client.

### Does this PR introduce _any_ user-facing change?

Add categories of `celeborn.quota,enabled` with `master` and `client`.

### How was this patch tested?

No.

Closes #2318 from AngersZhuuuu/CELEBORN-1277.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2024-02-26 11:33:14 +08:00
Chandni Singh
9185cae35a [CELEBORN-1257][FOLLOWUP] Removed the additional secured port from Celeborn Master
### What changes were proposed in this pull request?
https://github.com/apache/incubator-celeborn/pull/2292#discussion_r1497160753
Based on the above discussion, removing the additional secured port. The existing port will be used for secured communication when auth is enabled.

### Why are the changes needed?
These changes are for enabling authentication

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
This removed additional secured port.

Closes #2327 from otterc/CELEBORN-1257.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
2024-02-25 00:09:05 +08:00
SteNicholas
ff0cf15770
[CELEBORN-1283] TransportClientFactory avoid contention and get or create clientPools quickly
### What changes were proposed in this pull request?

`TransportClientFactory` avoid contention and get or create clientPools quickly.

### Why are the changes needed?

Avoid contention for getting or creating clientPools, and clean up the code.

Backport: [[SPARK-38555][NETWORK][SHUFFLE] Avoid contention and get or create clientPools quickly in the TransportClientFactory](https://github.com/apache/spark/pull/35860)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2322 from SteNicholas/CELEBORN-1283.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-02-23 15:30:24 +08:00
SteNicholas
b9bdea3c72
[CELEBORN-1280] Change default value of celeborn.worker.graceful.shutdown.recoverDbBackend to ROCKSDB
### What changes were proposed in this pull request?

Change the default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` from `LEVELDB` to `ROCKSDB`.

### Why are the changes needed?

Because the LevelDB support will be removed, the default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` could be changed to ROCKSDB instead of LEVELDB for preparation of LevelDB deprecation.

Backport:
 [[SPARK-45351][CORE] Change spark.shuffle.service.db.backend default value to ROCKSDB](https://github.com/apache/spark/pull/43142)
 [[SPARK-45413][CORE] Add warning for prepare drop LevelDB support](https://github.com/apache/spark/pull/43217)

### Does this PR introduce _any_ user-facing change?

The default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` is changed from `LEVELDB` to `ROCKSDB`.

### How was this patch tested?

No.

Closes #2320 from SteNicholas/CELEBORN-1280.

Lead-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: Nicholas Jiang <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-02-23 14:53:24 +08:00
Fei Wang
621a719d8d
[CELEBORN-1278] Avoid calculating all outstanding requests to improve performance
### What changes were proposed in this pull request?
Refer [SPARK-37894](https://issues.apache.org/jira/browse/SPARK-37984)/ https://github.com/apache/spark/pull/35276
Avoid calculating all outstanding requests to improve performance

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Not needed.

Closes #2319 from turboFei/SPARK-37984_backport.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-02-23 13:58:55 +08:00
Fei Wang
387bffc0a3 [CELEBORN-1275] Fix bug that callback function may hang when unchecked exception missed
### What changes were proposed in this pull request?
Refer: [SPARK-28160](https://issues.apache.org/jira/browse/SPARK-28160) / https://github.com/apache/spark/pull/24964
ByteBuffer.allocate may throw OutOfMemoryError when the response is large but no enough memory is available. However, when this happens, TransportClient.sendRpcSync will just hang forever if the timeout set to unlimited.

### Why are the changes needed?
To catch the exception of `ByteBuffer.allocate` in corner case.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Quote the local test in https://github.com/apache/spark/pull/24964
```
I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected IllegalArgumentException will be caught.

      Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes #2316 from turboFei/fix_transport_client_onsucess.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: chenfu <chenfu@xiaohongshu.com>
2024-02-23 10:27:54 +08:00
Angerszhuuuu
b94b86943e [CELEBORN-1276] Move checkQuotaSpaceAvailable from Quota to QuotaManager
### What changes were proposed in this pull request?
Move checkQuotaSpaceAvailable from Quota to QuotaManager

### Why are the changes needed?
Put method in correct place

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #2317 from AngersZhuuuu/CELEBORN-1276.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2024-02-23 10:25:40 +08:00
Angerszhuuuu
0c952ca915 [CELEBORN-1239][FEATURE] Celeborn QuotaManager support use ConfigService and support default quota setting
### What changes were proposed in this pull request?
This pr does 2 things:
1. Remove unnecessary conf QUOTA_MANAGER since we implement it with ConfigService and ConfigService already have a conf to indicate the implement method.
2. Move the quota manager to Master side since only master use this
3. Support quota manager use FsConfigService and support default system level

### Why are the changes needed?
1. Many times, for users who do not have a quota configured, we hope to have a default quota that applies to them.
2. Quota manager should support refresh
3. QuotaManager should support integrate with ConfigService

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added ut

Closes #2298 from AngersZhuuuu/CELEBORN-1239.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2024-02-22 18:00:19 +08:00
Fei Wang
252376981f [MINOR] Fix typos and wrong package name
### What changes were proposed in this pull request?

Fix some typos.

### Why are the changes needed?

Ditto.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Not needed.

Closes #2314 from turboFei/fix_typo.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-02-22 06:42:23 +00:00
SteNicholas
d71f16f7bf [CELEBORN-1254][FOLLOWUP] Rename celeborn.worker.sortPartition.reservedMemory.enabled to celeborn.worker.sortPartition.prefetch.enabled
### What changes were proposed in this pull request?

Rename `celeborn.worker.sortPartition.reservedMemory.enabled` to `celeborn.worker.sortPartition.prefetch.enabled`. Address [r1469066327](https://github.com/apache/incubator-celeborn/pull/2264/files#r1469066327) of pan3793.

### Why are the changes needed?

`celeborn.worker.sortPartition.reservedMemory.enabled` is misleading, which should represent that prefetch the original partition files during the first sequential reading path to leverage the Linux PageCache mechanism to speed up the subsequent random reading of them. The config name could use `celeborn.worker.sortPartition.prefetch.enabled` which is is more accurate.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2312 from SteNicholas/CELEBORN-1254.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-02-21 14:05:40 +08:00
Fei Wang
4f23af49c9 [CELEBORN-1273] Move java classes under scala src to java
### What changes were proposed in this pull request?
Just fix minor typo:
```
ls **/scala/**/*.java
common/src/main/scala/org/apache/celeborn/common/meta/WorkerEventInfo.java common/src/main/scala/org/apache/celeborn/common/meta/WorkerStatus.java
```

After this:
```
ls **/scala/**/*.java
zsh: no matches found: **/scala/**/*.java

ls **/java/**/*.scala
zsh: no matches found: **/java/**/*.scala
```
### Why are the changes needed?
Fix code format.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing UT.

Closes #2310 from turboFei/scala_java.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-02-20 15:34:18 +08:00
Angerszhuuuu
92704c7d06 [CELEBORN-1051] Add isDynamic property for CelebornConf
### What changes were proposed in this pull request?
Since we support ConfigService, many configuration can be dynamic, add `isDynamic` property for CelebornConf in this pr.

### Why are the changes needed?
Make configuration doc more cleear

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed UT

Closes #2308 from AngersZhuuuu/CELEBORN-1051.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2024-02-20 14:20:44 +08:00
SteNicholas
64b4338291 [CELEBORN-1052][FOLLOWUP] Improve the implementation of ConfigService
### What changes were proposed in this pull request?

Improve the implementation of `ConfigService` including:

- Removes `celeborn.dynamicConfig.enabled`.
- Changes `celeborn.dynamicConfig.store.backend` to optional.
- Renames `refreshAllCache` to `refreshCache` in `ConfigService`.
- Checks whether the dynamic config file exists and is file in `FsConfigServiceImpl`.

### Why are the changes needed?

Whether to enable dynamic config could check via whether `celeborn.dynamicConfig.store.backend` is provided, instead of `celeborn.dynamicConfig.enabled`. The `refreshAllCache` interface could rename to `refreshCache` and throw Exception simply. Meanwhile, `FsConfigServiceImpl` should check whether the dynamic config file exists and is file.

### Does this PR introduce _any_ user-facing change?

- Renames `refreshAllCache` to `refreshCache` in `ConfigService`.

### How was this patch tested?

- `ConfigServiceSuiteJ`

Closes #2304 from SteNicholas/CELEBORN-1052.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-02-19 22:42:10 +08:00
Fei Wang
7a05b2fc18 [CELEBORN-1016] Fix IPv6 host address resolve issue
### What changes were proposed in this pull request?

To close CELEBORN-1016, fix the issue when parse IPv6 host address.
### Why are the changes needed?

Fix CELEBORN-1016

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
UT.

Closes #2293 from turboFei/CELEBORN-1016_ipv6.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-02-17 10:31:49 +08:00
xiyu.zk
c1837536a1 [CELEBORN-1267] Add config to control worker check in CelebornShuffleFallbackPolicyRunner
### What changes were proposed in this pull request?
As title.

### Why are the changes needed?
For some scenarios, if Celeborn cannot be used, users want to report an error directly instead of fallback.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI

Closes #2291 from kerwin-zk/add-config.

Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-02-07 19:01:41 +08:00
Chandni Singh
ab4c0bc85b [CELEBORN-1257] Adds a secured port in Celeborn Master for secure communication with LifecycleManager
### What changes were proposed in this pull request?
This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager.
This is part of adding authentication support in Celeborn (see CELEBORN-1011).

This change targets just adding the secured port to Master. The following items from the proposal are still pending:
1. Persisting the app secrets in Ratis.
2. Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
3. Secured and internal port in Workers.
4. Secured communication between workers and clients.

In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261

### Why are the changes needed?
It is needed for adding authentication support to Celeborn (CELEBORN-1011)

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added a simple UT.

Closes #2281 from otterc/CELEBORN-1257.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-02-06 14:53:28 +08:00
Shuang
d89dcf0e06 [CELEBORN-1054] Support db based dynamic config service
### What changes were proposed in this pull request?

Support database based store backend implementation for dynamic configuration management

### Why are the changes needed?

Currently celeborn provides `FsConfigServiceImpl` implementation for dynamic config service which is based on file system, We cloud Support database based store backend implementation.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- `ConfigServiceSuiteJ#testDbConfig`

Closes #2273 from RexXiong/CELEBORN-1054.

Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-02-05 13:23:25 +08:00
SteNicholas
4c5e6f065c
[CELEBORN-1182] Support application dimension ActiveConnectionCount metric to record the number of registered connections for each application
### What changes were proposed in this pull request?

`WorkerSource` supports application dimension `ActiveConnectionCount` metric to record the number of registered connections for each application.

### Why are the changes needed?

`ActiveConnectionCount` metric records the number of registered connections at present. It's recommended to support dimension ActiveConnectionCount metric to record the number of registered connections for each application in Worker. Application dimension `ActiveConnectionCount` metric could provide users with the actual number of registered connections for each application.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2167 from SteNicholas/CELEBORN-1182.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-02-02 16:29:10 +08:00
SteNicholas
aad3929018
[CELEBORN-1259] Improve the default gracePeriod of ThreadUtils#shutdown
### What changes were proposed in this pull request?

Introduce `ThreadUtils#shutdown(executor)` method to improve the default gracePeriod of `ThreadUtils#shutdown`.

### Why are the changes needed?

The default value of `gracePeriod` for `ThreadUtils#shutdown` is 30 seconds at present. Meanwhile, the `gracePeriod` of most invoker for `ThreadUtils#shutdown` is 800 milliseconds. Therefore, the default `gracePeriod` of `ThreadUtils#shutdown` could be improved as 800 milliseconds.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2276 from SteNicholas/CELEBORN-1259.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-02-01 18:13:36 +08:00
SteNicholas
05fa11b3a0 [CELEBORN-1174] Introduce application dimension resource consumption metrics
### What changes were proposed in this pull request?

Introduce application dimension resource consumption metrics for `ResourceConsumptionSource`.

### Why are the changes needed?

`ResourceConsumption` namespace metrics are generated for each user and they are identified using a metric tag at present. It's recommended to introduce application dimension resource consumption metrics that expose application dimension resource consumption of Master and Worker. By monitoring resource consumption in the application dimension, you can obtain the actual situation of application resource consumption.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- `WorkerInfoSuite#WorkerInfo toString output`
- `PbSerDeUtilsTest#fromAndToPbResourceConsumption`
- `MasterStateMachineSuitej#testObjSerde`

Closes #2161 from SteNicholas/CELEBORN-1174.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-02-01 15:24:29 +08:00
Shuang
e71d912d50 [CELEBORN-1245] Support Celeborn Master(Leader) to manage workers
### What changes were proposed in this pull request?
1. Support Celeborn Master(Leader) to manage workers by sending event when heartbeat
2. Add Worker Status to Worker then we can know the status of the workers(such as during decommission...)
3. Add Http interface for master to handleWorkerEvent/getWorkerEvent

### Why are the changes needed?
Currently, we only support managing the status of workers on the worker side. This pr supports the master to manage the status of all workers. By sending events such as (Decommission/Graceful/Exit) when heartbeat, workers can be asynchronously execute the command from master. MeanWhile we can't know what the worker status during worker decommission so this pr add worker status to tell the exactly status of the worker.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #2255 from RexXiong/CELEBORN-1245.

Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-02-01 09:44:59 +08:00
Xianming Lei
62c8ac9c52 [CELEBORN-1241] Introduce hot load for CelebornRackResolver
### What changes were proposed in this pull request?
 Introduce hot load for CelebornRackResolver.

### Why are the changes needed?
In production environment, we often expand the machine, so the rack configuration also needs to be updated in time.

### Does this PR introduce _any_ user-facing change?
master.md

### How was this patch tested?
UTs.

Closes #2246 from leixm/issue_1241.

Lead-authored-by: Xianming Lei <jerrylei@apache.org>
Co-authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2024-01-30 18:44:02 +08:00
Chandni Singh
5d7892988a [CELEBORN-1012] Add a dedicated internal port in Master to talk to Workers and other Masters
### What changes were proposed in this pull request?
With authentication ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)), the handling of messages from Client by Celeborn services (Master/workers) will go through a SASL handshake. However, messages exchanged between Masters and Workers will not.
With a single netty server on the Master/Workers handling both client and master/workers messages, differentiating between the two types of connection is a challenge. It will be better if Master/Workers have a separate designated port for Clients and a separate one for internal components (workers and other Masters).

In this change, we propose
- the config that enables creating dedicated internal ports on Masters/Workers.
- creation of the dedicated internal port in just the Master. A subsequent PR will add that creation of the dedicated internal port in Workers.

### Why are the changes needed?
This change is required for adding authentication. ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)).

### Does this PR introduce _any_ user-facing change?
Yes, there are new configurations added.

### How was this patch tested?
Added a UT

Closes #2265 from otterc/CELEBORN-1012.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-30 09:53:35 +08:00
CodingCat
916df8f6ec [CELEBORN-1235] Start test nodes in random ports to allow multiple builds run in the same ci server
### What changes were proposed in this pull request?

this PR is to improve the test implementations so that it starts test nodes in random ports instead of using the hardcoded ones

### Why are the changes needed?

currently the test nodes are started in the hard coded ports, this prevents to run multiple builds in the same CI/CD server (which is not uncommonly seen in many companies infra)

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

it runs in our private CI/CD infra with many parallel builds very well

Closes #2237 from CodingCat/enhance_test.

Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-27 12:51:53 +08:00
SteNicholas
954277d07c [CELEBORN-1254] PartitionFilesSorter seeks to position of each block and does not warm up for non-hdfs files
### What changes were proposed in this pull request?

Introduce `celeborn.worker.sortPartition.reservedMemory.enabled` to support that `PartitionFilesSorter` seeks to position of each block and does not warmup for non-hdfs files.

### Why are the changes needed?

File sorting includes three steps: reading files, sorting MapIds, and writing files. The default block of Celeborn is 256k, and the number of blocks is about 1000, so the sorting process is very fast, and the main overhead is file reading and writing. There are roughly three options for the entire sorting process:

1. Memory of the file size is allocated in advance, the file is read in as a whole, MapId is parsed and sorted, and Blocks are written back to the disk in MapId order.
2. No memory is allocated, seek to the location of each block, parse and sort the MapId, and transfer the Blocks of the original file to the new file in the order of MapId.
3. Allocate a small block of memory (such as 256k), read the entire file sequentially, parse and sort the MapId, and transfer the block of the original file to the new file in the order of MapId.

From an IO perspective, at first glance, solution 1 uses sufficient memory and there is no sequential reading and writing; solution 2 has random reading and random writing; solution 3 has sequential writing. Intuitively solution 1 has better performance. Due to the existence of PageCache, when writing a file in solution 3, the original file is likely to be cached in PageCache. `PartitionFilesSorter` support solution3 with PageCache at present, which has better performance especially HDD disk. It's better to support solution2 with switch config that seeks to position of each block and does not warm up for non-hdfs files especially SDD disk.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA and cluster.

Closes #2264 from SteNicholas/CELEBORN-1254.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
2024-01-26 19:20:06 +08:00
Angerszhuuuu
3ffed66c40 [CELEBORN-1236][METRICS] Celeborn add metrics about thread pool
### What changes were proposed in this pull request?
Add metrics about worker's thread pool, help admin to observe the thread pool's work status.

ThreadPool list as below:

1. celeborn-dispatcher
2. celeborn-netty-rpc-connection-executor
3. worker-disk-{mount_point}-cleaner
4. worker-device-checker
5. flusher-{mount_point}
6. worker-file-sorter-executor
7. worker-data-replicator
8. worker-files-committer
9. worker-expired-shuffle-cleaner

```
metrics_active_thread_count_Value{role="Worker",threadPool="celeborn-dispatcher"} 64 1706237338484
metrics_pending_task_count_Value{role="Worker",threadPool="celeborn-dispatcher"} 0 1706237338484
metrics_pool_size_Value{role="Worker",threadPool="celeborn-dispatcher"} 64 1706237338484
metrics_core_pool_size_Value{role="Worker",threadPool="celeborn-dispatcher"} 64 1706237338484
metrics_maximum_pool_size_Value{role="Worker",threadPool="celeborn-dispatcher"} 64 1706237338484
metrics_largest_pool_size_Value{role="Worker",threadPool="celeborn-dispatcher"} 64 1706237338484
metrics_active_thread_count_Value{role="Worker",threadPool="celeborn-netty-rpc-connection-executor"} 0 1706237338484
metrics_pending_task_count_Value{role="Worker",threadPool="celeborn-netty-rpc-connection-executor"} 0 1706237338484
metrics_pool_size_Value{role="Worker",threadPool="celeborn-netty-rpc-connection-executor"} 0 1706237338484
metrics_core_pool_size_Value{role="Worker",threadPool="celeborn-netty-rpc-connection-executor"} 64 1706237338484
metrics_maximum_pool_size_Value{role="Worker",threadPool="celeborn-netty-rpc-connection-executor"} 64 1706237338484
metrics_largest_pool_size_Value{role="Worker",threadPool="celeborn-netty-rpc-connection-executor"} 1 1706237338484
metrics_active_thread_count_Value{role="Worker",threadPool="worker-disk-/-cleaner"} 0 1706237338484
metrics_pending_task_count_Value{role="Worker",threadPool="worker-disk-/-cleaner"} 0 1706237338484
metrics_pool_size_Value{role="Worker",threadPool="worker-disk-/-cleaner"} 0 1706237338484
metrics_core_pool_size_Value{role="Worker",threadPool="worker-disk-/-cleaner"} 4 1706237338484
metrics_maximum_pool_size_Value{role="Worker",threadPool="worker-disk-/-cleaner"} 4 1706237338484
metrics_largest_pool_size_Value{role="Worker",threadPool="worker-disk-/-cleaner"} 0 1706237338485
metrics_active_thread_count_Value{role="Worker",threadPool="worker-device-checker"} 0 1706237338485
metrics_pending_task_count_Value{role="Worker",threadPool="worker-device-checker"} 0 1706237338485
metrics_pool_size_Value{role="Worker",threadPool="worker-device-checker"} 2 1706237338485
metrics_core_pool_size_Value{role="Worker",threadPool="worker-device-checker"} 5 1706237338485
metrics_maximum_pool_size_Value{role="Worker",threadPool="worker-device-checker"} 5 1706237338485
metrics_largest_pool_size_Value{role="Worker",threadPool="worker-device-checker"} 2 1706237338485
metrics_thread_count_Value{role="Worker",threadPool="LocalFlusher1441328175-/"} 2 1706237338485
metrics_thread_is_terminated_count_Value{role="Worker",threadPool="LocalFlusher1441328175-/"} 0 1706237338485
metrics_thread_is_shutdown_count_Value{role="Worker",threadPool="LocalFlusher1441328175-/"} 0 1706237338485
metrics_active_thread_count_Value{role="Worker",threadPool="worker-file-sorter-executor"} 0 1706237338485
metrics_pending_task_count_Value{role="Worker",threadPool="worker-file-sorter-executor"} 0 1706237338485
metrics_pool_size_Value{role="Worker",threadPool="worker-file-sorter-executor"} 0 1706237338485
metrics_core_pool_size_Value{role="Worker",threadPool="worker-file-sorter-executor"} 24 1706237338485
metrics_maximum_pool_size_Value{role="Worker",threadPool="worker-file-sorter-executor"} 24 1706237338485
metrics_largest_pool_size_Value{role="Worker",threadPool="worker-file-sorter-executor"} 0 1706237338485
metrics_active_thread_count_Value{role="Worker",threadPool="worker-data-replicator"} 0 1706237338485
metrics_pending_task_count_Value{role="Worker",threadPool="worker-data-replicator"} 0 1706237338485
metrics_pool_size_Value{role="Worker",threadPool="worker-data-replicator"} 0 1706237338485
metrics_core_pool_size_Value{role="Worker",threadPool="worker-data-replicator"} 64 1706237338485
metrics_maximum_pool_size_Value{role="Worker",threadPool="worker-data-replicator"} 64 1706237338485
metrics_largest_pool_size_Value{role="Worker",threadPool="worker-data-replicator"} 0 1706237338485
metrics_active_thread_count_Value{role="Worker",threadPool="worker-files-committer"} 0 1706237338485
metrics_pending_task_count_Value{role="Worker",threadPool="worker-files-committer"} 0 1706237338485
metrics_pool_size_Value{role="Worker",threadPool="worker-files-committer"} 0 1706237338485
metrics_core_pool_size_Value{role="Worker",threadPool="worker-files-committer"} 32 1706237338485
metrics_maximum_pool_size_Value{role="Worker",threadPool="worker-files-committer"} 32 1706237338485
metrics_largest_pool_size_Value{role="Worker",threadPool="worker-files-committer"} 0 1706237338485
metrics_active_thread_count_Value{role="Worker",threadPool="worker-expired-shuffle-cleaner"} 0 1706237338485
metrics_pending_task_count_Value{role="Worker",threadPool="worker-expired-shuffle-cleaner"} 0 1706237338485
metrics_pool_size_Value{role="Worker",threadPool="worker-expired-shuffle-cleaner"} 2 1706237338485
metrics_core_pool_size_Value{role="Worker",threadPool="worker-expired-shuffle-cleaner"} 64 1706237338485
metrics_maximum_pool_size_Value{role="Worker",threadPool="worker-expired-shuffle-cleaner"} 64 1706237338485
metrics_largest_pool_size_Value{role="Worker",threadPool="worker-expired-shuffle-cleaner"} 2 1706237338485
```
### Why are the changes needed?
Help observe server status

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
MT

Closes #2239 from AngersZhuuuu/CLEBORN-1236.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
2024-01-26 18:14:05 +08:00
Angerszhuuuu
5c54388bc2
[CELEBORN-1252] Fix resource consumption of worker does not update when update interval is greater than heartbeat interval
### What changes were proposed in this pull request?

 Resource consumption of worker does not update when update interval of resource consumpution is greater than heartbeat interval.

<img width="1741" alt="截屏2024-01-24 14 49 50" src="https://github.com/apache/incubator-celeborn/assets/46485123/21cfd412-c69e-4955-8bc8-155ee470697d">

This pull request introduces below changes:

1. Avoid master repeat add gauge for same user
2. For worker, user resource consumption can directly get from worker's snapshot, didn't need update interval

### Why are the changes needed?

No.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2260 from AngersZhuuuu/CELEBORN-1252.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-25 20:28:19 +08:00
Chandni Singh
b7d6704cc8 [CELEBORN-1251] Connect the server and client bootstraps to RpcEnv
### What changes were proposed in this pull request?
This connects client/server bootstraps to the RpcEnv in Celeborn. This is a prerequisite for leveraging RPC security in subsequent PRs where we will add Sasl authentication to the communication between the client and Celeborn Master/Workers.
It is part of the epic: https://issues.apache.org/jira/browse/CELEBORN-1011.

### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added some UTs

Closes #2257 from otterc/CELEBORN-1251.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
2024-01-25 17:28:48 +08:00
mingji
0249698c5b
[CELEBORN-1247] Output config's alternatives to doc
### What changes were proposed in this pull request?
Add configs' alternatives to doc.

### Why are the changes needed?
To help users use correct configs.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA.

Closes #2253 from FMX/b1241.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-24 11:21:23 +08:00
Angerszhuuuu
67e6cbfb51
[CELEBORN-1242] Unify celeborn thread name format
### What changes were proposed in this pull request?

Unify celeborn thread name format with the following pattern:

- client: `celeborn-client-[component]-[function]er`
- service: `[master|worker]-[component]-[function]er`
- other: `celeborn-[component]-[function]er`

### Why are the changes needed?

It's recommended to unify celeborn thread name format especially client side for application.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2248 from AngersZhuuuu/CELEBORN-1242.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-23 16:56:40 +08:00
Chandni Singh
a86a315bd5 [CELEBORN-1229] Support for application registration with Celeborn Master
### What changes were proposed in this pull request?
This adds support for applications to register with Celeborn Master by introducing the `RegistrationClientBootstrap`, `RegistrationServerBootstrap`, and `RegistrationRpcHandler` classes, which facilitate the client connection setup with the Celeborn Master. The registration protocol details are described in the [auth proposal](https://docs.google.com/document/d/1D1U2COYhS3ob7l0t2WghRhBk_Fci9RGx-2FBXA3nvXk/edit#heading=h.po9dc3r1kb3k).

### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
Add the config `celeborn.auth.enabled`

### How was this patch tested?
Added UTs.

Closes #2231 from otterc/CELEBORN-1229.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-23 09:31:19 +08:00
SteNicholas
9107779174
[CELEBORN-1225][FOLLOWUP] Worker should build replicate factory to get client for sending replicate data
### What changes were proposed in this pull request?

`PushDataHandler` should build replicate factory to get client for sending replicate data instead of push client factory. Meanwhile, timeout checker of `TransportResponseHandler` should run with `replicate` module instead of `push`.

Follow up #2232.

### Why are the changes needed?

`PushDataHandler` uses push client factory to create client for replicating, which should use replicate factory, otherwise replicate module configuration does not take effect for replicating of worker server. Meanwhile, timeout checker of `TransportResponseHandler` runs with `push` module, which does not work well with replicate client for worker.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA and cluster.

Closes #2241 from SteNicholas/CELEBORN-1225.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-19 09:46:27 +08:00
SteNicholas
30608ea698
[CELEBORN-1225] Worker should build replicate factory to get client for sending replicate data
### What changes were proposed in this pull request?

`PushDataHandler` should build replicate factory to get client for sending replicate data instead of push client factory.

### Why are the changes needed?

`PushDataHandler` uses push client factory to create client for replicating, which should use replicate factory, otherwise replicate module configuration does not take effect for replicating of worker server.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA and cluster.

Closes #2232 from SteNicholas/CELEBORN-1225.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-17 16:40:46 +08:00
Fei Wang
d46b6623b3
[CELEBORN-1228] Format the timestamp when recording worker failure
### What changes were proposed in this pull request?

Format the timestamp when recoding worker failure inforamtion.

### Why are the changes needed?

Now the long type timestamp is difficult to view and confuse without reading source code.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2230 from turboFei/date_format.

Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-17 14:04:30 +08:00
zky.zhoukeyong
d392302f0c [CELEBORN-1224] Make TransportMessage#type transient for backward compatibility
### What changes were proposed in this pull request?
When I'm testing upgrading Master from 0.3.2 to 0.4.0, I encountered the following error:

![image](https://github.com/apache/incubator-celeborn/assets/948245/e519a6f6-4d60-4adc-b680-d8e0cb6e04eb)

This PR fixes the backward compatibility exception.

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test

Closes #2228 from waitinfuture/b1224.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2024-01-16 09:32:27 +08:00
sychen
5f0a8406c6
[CELEBORN-1133][FOLLOWUP] Refactor FileInfo
### What changes were proposed in this pull request?

### Why are the changes needed?
```
common/src/main/java/org/apache/celeborn/common/meta/DiskFileInfo.java:[110,14] [MissingOverride] getFileLength implements method in FileInfo; expected Override
common/src/main/java/org/apache/celeborn/common/meta/MapFileMeta.java:[40,38] [InconsistentCapitalization] Found the field 'numSubPartitions' with the same name as the parameter 'numSubpartitions' but with different capitalization.
worker/src/main/java/org/apache/celeborn/service/deploy/worker/storage/MapPartitionDataWriter.java:[164,65] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2222 from cxzl25/CELEBORN-1133-FOLLOWUP.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2024-01-11 11:09:15 +08:00
Cheng Pan
bb86074163
[CELEBORN-1202][FOLLOWUP] Update LICENSE and NOTICE files
### What changes were proposed in this pull request?

Update LICENSE and NOTICE files according to the mailing list comments.

### Why are the changes needed?

https://lists.apache.org/thread/zw5cw621dqgbktdolx7qynho0zt451pk

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Review.

Closes #2213 from pan3793/CELEBORN-1202-followup.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2024-01-10 19:26:54 +08:00
Chandni Singh
ced6f93bc1 [CELEBORN-1212] Support for Anonymous SASL Mechanism
### What changes were proposed in this pull request?
This adds support for ANONYMOUS Sasl Mechanism.

### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT.

Closes #2210 from otterc/CELEBORN-1212.

Lead-authored-by: Chandni Singh <singh.chandni@gmail.com>
Co-authored-by: otterc <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-06 20:16:23 +08:00
Aaron Wang
c0b7ff4477 [MINOR] Fix typos
### What changes were proposed in this pull request?
- Fix some typos.

### Why are the changes needed?
- Ditto.

### Does this PR introduce _any_ user-facing change?
- No.

### How was this patch tested?
- No need.

Closes #2214 from Radeity/fix-typo.

Authored-by: Aaron Wang <wangweirao16@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-06 20:08:17 +08:00
mingji
a3c28d0b34 [CELEBORN-1150] Revert "[] support io encryption for spark"
### What changes were proposed in this pull request?
Revert "[CELEBORN-1150] support io encryption for spark".

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2208 from FMX/b1150-3.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-04 13:00:58 +08:00
mingji
8b6aae04d1 [CELEBORN-1201] Optimize memory usage of cache in partition sorter
### What changes were proposed in this pull request?
Add a cache in partition sorted and limit its max size.

### Why are the changes needed?
To reduce memory consumption in partition sort by tweak the index cache.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #2194 from FMX/B1201.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-04 11:34:48 +08:00
sychen
9f8a6bcb79 [CELEBORN-1190][FOLLOWUP] Apply error prone patch and suppress some problems
### What changes were proposed in this pull request?
1. Fix IntLongMath, InconsistentCapitalization, UnnecessaryAssignment
2. disable StringSplitter, EmptyBlockTag, EqualsGetClass, MissingSummary, BadImport

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

```bash
./build/make-distribution.sh --release
```

PR

total  202

```
  43 SynchronizeOnNonFinalField
  24 StaticAssignmentInConstructor
  21 JdkObsolete
  16 ThreadLocalUsage
  16 MutableConstantField
  16 MissingCasesInEnumSwitch
  11 UnusedMethod
  11 NonAtomicVolatileUpdate
   8 UnusedNestedClass
   8 NonOverridingEquals
   8 Finally
   4 MixedMutabilityReturnType
   4 DoubleBraceInitialization
   4 CatchAndPrintStackTrace
   4 CanonicalDuration
   2 ReferenceEquality
   1 ClassCanBeStatic
   1 ByteBufferBackingArray
```

Closes #2180 from cxzl25/error_prone_patch_followup.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-03 15:59:50 +08:00
zwangsheng
d1b30b2827 [CELEBORN-1208][WORKER] Unify parse uniqueId to WorkerInfo
### What changes were proposed in this pull request?

Unify parse uniqueId to WorkerInfo

### Why are the changes needed?

Keep parse uniqueId behavior consistent and avoid multiple changes

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

Closes #2202 from zwangsheng/CELEBORN-1208.

Authored-by: zwangsheng <binjieyang@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-02 21:27:58 +08:00
mingji
7be05b430b [CELEBORN-1133] Refactor fileinfo
### What changes were proposed in this pull request?
Rename FileWriter to PartitionLocationDataWriter, add storageManager, delete fileinfo, and flusher in the constructor.

FileInfo(userIdentifier,partitionSplitEnabled,fileMeta)
– NonMemoryFileInfo(streams,filePath,storageType,bytesFlushed)
– MemoryFileInfo(length,buffer)

FileMeta
– reduceFileMeta(chunkOffsets,sorted)
– mapFileMeta(bufferSize,numSubPartitions)

### Why are the changes needed?
1. To make concepts more clear.
2. To support memory storage and HDFS slot management.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster test with worker kill.

Closes #2130 from FMX/b1133.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-02 21:26:10 +08:00
Cheng Pan
77e468161d [CELEBORN-891] Remove pipeline feature for sort based writer
### What changes were proposed in this pull request?

Remove pipeline feature for sort based writer

### Why are the changes needed?

The pipeline feature is added as part of CELEBORN-295, for performance. Eventually, an unresolvable issue that would crash the JVM was identified in https://github.com/apache/incubator-celeborn/pull/1807, and after discussion, we decided to delete this feature.

### Does this PR introduce _any_ user-facing change?

No, the pipeline feature is disabled by default, there are no changes to users who use the default settings.

### How was this patch tested?

Pass GA.

Closes #2196 from pan3793/CELEBORN-891.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-01 10:42:17 +08:00
liangyongyuan
f8eb1605a1 [CELEBORN-1036][FOLLOWUP] When inflightBatchesPerAddress clear, totalInflightReqs should reset
### What changes were proposed in this pull request?
When   `inflightBatchesPerAddress`  clear in  `InFlightRequestTracker.cleanup `, `totalInflightReqs` should also reset to avoid getting stuck when exiting.

### Why are the changes needed?
`inflightBatchesPerAddress` has cleared and be empty,but totalInflightReqs is always bigger than 0.
![image](https://github.com/apache/incubator-celeborn/assets/46274164/28223f1e-ac9b-4e0b-a26d-9b529af6bca1)

This occurred during the first attempt of the task, where the request for map end failed, but the driver marked that the map has already ended.
![image](https://github.com/apache/incubator-celeborn/assets/46274164/7f43d808-2f9b-4775-b04f-30afe4d31e5a)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
through exists uts

Closes #2191 from lyy-pineapple/celebron-1036.

Authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-27 16:10:12 +08:00
SteNicholas
e7e39a51be
[CELEBORN-1189] Introduce RunningApplicationCount metric and /applications API to record running applications of worker
### What changes were proposed in this pull request?

Introduce `RunningApplicationCount` metric and `/applications` API to record running applications for Celeborn worker.

### Why are the changes needed?

`RunningApplicationCount` metrics only monitors the count of running applications in the cluster for master. Meanwhile, `/listTopDiskUsedApps` API lists the top disk usage application ids for master and worker. Therefore `RunningApplicationCount` metric and `/applications` API could be introduced to record running applications of worker.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2172 from SteNicholas/CELEBORN-1189.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-27 09:51:16 +08:00
Angerszhuuuu
f751df50ba [CELEBORN-1192][BUG] Celeborn wait task timeout error message should show correct corresponding batch and target host and port
### What changes were proposed in this pull request?
Celeborn wait task timeout error message should show correct corresponding batch and target host and port

### Why are the changes needed?
Current error log here is confused, can't found out the target hostAndPushPort that have problem.

### Does this PR introduce _any_ user-facing change?
Refactor log help debug

### How was this patch tested?

Closes #2183 from AngersZhuuuu/CELEBORN-1192.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-12-22 16:53:30 +08:00
liangyongyuan
08e7b5962b [CELEBORN-1193] ResettableSlidingWindowReservoir should reset full to false
### What changes were proposed in this pull request?
When ResettableSlidingWindowReservoir reset,  it should reset` full` to `false`, `index` to `0`

### Why are the changes needed?
The ResettableSlidingWindowReservoir class, after invoking the reset operation, resets the data to zero, but fails to reset the 'index' and 'full' variables. Consequently, when retrieving a snapshot in the next operation, it is possible to obtain a considerable amount of zeros. This issue extends to the inaccurate calculation of metrics such as average and minimum values.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add uts

Closes #2182 from lyy-pineapple/slide-bug.

Lead-authored-by: liangyongyuan <2081248500@qq.com>
Co-authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-21 19:53:47 +08:00
liangyongyuan
4304be1a60 [CELEBORN-1172][SPARK] Support dynamic switch shuffle push write mode based on partition number
### What changes were proposed in this pull request?
Dynamically determine the writing mode in Spark based on the number of partitions.

### Why are the changes needed?
Enhance the flexibility of shuffle writes to improve performance.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add uts

Closes #2160 from lyy-pineapple/dynamic-write-mode.

Lead-authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-21 16:58:51 +08:00
zwangsheng
6c2fdf7477
[CELEBORN-1188][TEST] Using JUnit function instead of java assert
### What changes were proposed in this pull request?
Using Junit function instead of java assert.

### Why are the changes needed?
When java assert fail, will throw AssertException, which is hard to find diff.

![截屏2023-12-20 10 34 52](https://github.com/apache/incubator-celeborn/assets/52876270/b36421a5-64e1-4717-a6d4-3b08db403293)

Instead, when we use junit assert, we can clearly find diff.

![截屏2023-12-20 11 17 21](https://github.com/apache/incubator-celeborn/assets/52876270/ce39fa20-e9ab-4419-a4ca-62c4157e4b2c)

### Does this PR introduce _any_ user-facing change?
NO, only test changed

### How was this patch tested?
Run CI

Closes #2173 from zwangsheng/CELEBORN-1188.

Authored-by: zwangsheng <binjieyang@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-12-20 21:20:38 +08:00
sychen
7f653ce7d6 [CELEBORN-1190] Apply error prone patch and suppress some problems
### What changes were proposed in this pull request?
1.  Fix MissingOverride, DefaultCharset, UnnecessaryParentheses Rule
2. Exclude generated sources, FutureReturnValueIgnored, TypeParameterUnusedInFormals, UnusedVariable

### Why are the changes needed?
```
./build/make-distribution.sh --release
```
We get a lot of WARNINGs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2177 from cxzl25/error_prone_patch.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-12-20 20:54:18 +08:00
SteNicholas
089a0f8686
[CELEBORN-1036][FOLLOWUP] totalInflightReqs should decrement when batchIdSet contains the batchId to avoid duplicate caller of removeBatch
### What changes were proposed in this pull request?

`totalInflightReqs` decrements when `batchIdSet` contains the `batchId` to avoid duplicate caller of `removeBatch` in `InFlightRequestTracker`.

### Why are the changes needed?

Caller of `InFlightRequestTracker#removeBatch` may be duplicated, which cause that `totalInflightReqs` could be negative. The source of truth should be that `totalInflightReqs` should decrement when `batchIdSet` contains the `batchId`. If `batchIdSet` does not contain the `batchId`, it does not need to decrement `totalInflightReqs`.

```
23/12/05 20:05:01 [Executor task launch worker for task 17.0 in stage 10.0 (TID 206)] ERROR InFlightRequestTracker: After waiting for 1200000 ms, there are still -1 batches in flight for hostAndPushPort [], which exceeds the current limit 0.
23/12/05 20:05:01 [Executor task launch worker for task 17.0 in stage 10.0 (TID 206)] WARN InFlightRequestTracker: Clear InFlightRequestTracker
23/12/05 20:05:01 [Executor task launch worker for task 17.0 in stage 10.0 (TID 206)] ERROR Executor: Exception in task 17.0 in stage 10.0 (TID 206)
org.apache.celeborn.common.exception.CelebornIOException: Waiting timeout for task 4-17-0 while limiting zero in-flight requests
	at org.apache.celeborn.client.ShuffleClientImpl.limitZeroInFlight(ShuffleClientImpl.java:598)
	at org.apache.celeborn.client.ShuffleClientImpl.prepareForMergeData(ShuffleClientImpl.java:1175)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.close(HashBasedShuffleWriter.java:455)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.write(HashBasedShuffleWriter.java:210)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:589)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1545)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:594)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2134 from SteNicholas/CELEBORN-1036.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-20 18:11:07 +08:00
mingji
4dacf72a6d
[CELEBORN-1150] support io encryption for spark
### What changes were proposed in this pull request?
1. To support io encryption for spark.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and manually test on a cluster.

Closes #2135 from FMX/B1150.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-19 11:44:05 +08:00
Chandni Singh
b09febdd8c [CELEBORN-1176] Server side support for Sasl Auth
### What changes were proposed in this pull request?

This adds the server side Sasl authentication support in the transport layer. Most of this code is taken from Apache Spark.

### Why are the changes needed?

The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UTs.

Closes #2164 from otterc/CELEBORN-1176.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-18 11:27:28 +08:00
zky.zhoukeyong
e361788e48 [CELEBORN-1178] Destroy fail reserved slots in LifecycleManager#reserveSlotsWithRetry
### What changes were proposed in this pull request?
I'm testing main branch and encountered the following scenario.
I run `sbin/stop-worker.sh` near simultaneously on 3 out of 6 workers, and I'm expecting the 3 workers
will soon shutdown because I enabled graceful shutdown. However, only the first worker I stopped
shutdown in 15s as expected, the other two won't shutdown until shutdown timeout.

After digging into it, I found `LifecycleManager#reserveSlotsWithRetry` will reserve for the same location
twice:
1. At T1, only worker1 shutdown, pushes receive HARD_SPLIT and goes to revive
2. At T2, LifecycleManager handles revive requests in batch, and try to reallocate the locs to other workers
3. At T3, reserve to worker3 succeeds because it's not shutdown yet, but reserve to worker2 fails because it's shutdown
4. At T4, LifecycleManager will re-allocate the failed slots to other workers except worker1 and worker2. However, at this time Worker3 is also shutdown, so it fails to reserve on worker3
5. At T5, it re-allocates slots that failed to worker3. However, `getFailedPartitionLocations` will return slots allocated to worker3 in step 3, and increment the epoch to 2. At this time, worker3 has slots of epoch 1, but they will never to pushed to because newer epoch 3 is generated at the same time
6. Since the epoch 2 locs in worker3 will never be pushed to, it will never get a chance to return HARD_SPLIT, as a result it can't fast shutdown untile timeout.

This PR fixes this by destroying failed to be reserved slots in the process of `reserveSlotsWithRetry`

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Before:
![image](https://github.com/apache/incubator-celeborn/assets/948245/50c55524-d37f-494e-a5aa-fba682438cda)
After:
![image](https://github.com/apache/incubator-celeborn/assets/948245/8c90a869-b388-46f3-a86b-a37fd0f4ce0f)

Closes #2163 from waitinfuture/1178.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-17 14:28:04 +08:00
Chandni Singh
600bd53616 [CELEBORN-1180] Changed the version of Sasl Auth related config to 0.5
### What changes were proposed in this pull request?
Changes the version of the config to 0.5 given that 0.4 will be released soon.

### Why are the changes needed?
See above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
NA

Closes #2165 from otterc/CELEBORN-1180.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-16 13:45:46 +08:00
zky.zhoukeyong
309153a99b [CELEBORN-1175] Add UT for commit files
### What changes were proposed in this pull request?
As title.

### Why are the changes needed?
As title.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Passes UTs.

Closes #2162 from waitinfuture/1175-2.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-16 01:36:29 +08:00
zky.zhoukeyong
01feb93abb [CELEBORN-1167] Avoid calling parmap when destroy slots
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
One user reported that LifecycleManager's parmap can create huge number of threads and causes OOM.

![image](https://github.com/apache/incubator-celeborn/assets/948245/1e9a0b83-32fe-40d5-8739-2b370e030fc8)

There are four places where parmap is called:

1. When LifecycleManager commits files
2. When LifecycleManager reserves slots
3. When LifecycleManager setup connection to workers
4. When LifecycleManager call destroy slots

This PR fixes the fourth one. To be more detail, this PR eliminates `parmap` when destroying slots, and also replaces `askSync` with `ask`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test and GA.

Closes #2156 from waitinfuture/1167.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-15 18:30:31 +08:00
Chandni Singh
a03ce6c165 [CELEBORN-1157] Add client-side support for Sasl Authentication in the transport layer
### What changes were proposed in this pull request?
This adds the client side Sasl authentication support in the transport layer. Most of this code is taken from Apache Spark.

### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
Added a configuration for Sasl request timeout

### How was this patch tested?
Will be adding `CelebornSaslSuiteJ.java` (https://github.com/apache/incubator-celeborn/pull/2105) that tests the end-to-end Sasl flow.

Closes #2139 from otterc/CELEBORN-1157.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-14 22:52:49 +08:00
sychen
2504b50dd2 [CELEBORN-1170] Upgrade snappy-java from 1.1.8.2 to 1.1.10.5
### What changes were proposed in this pull request?

### Why are the changes needed?
https://github.com/apache/incubator-celeborn/pull/2143

The snappy-java 1.1.8.2 version has the follow CVE vulnerabilities, see
https://scout.docker.com/vulnerabilities/id/CVE-2023-43642
https://scout.docker.com/vulnerabilities/id/CVE-2023-34455

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #2158 from cxzl25/CELEBORN-1170.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-14 22:28:32 +08:00
Fu Chen
0f2a9a3a63 [CELEBORN-1160][FOLLOWUP] Update the version for celeborn.client.rpc.shared.threads to 0.3.2
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

Since we are backporting #2145 to branch-0.3, and the configuration entry `celeborn.client.rpc.shared.threads` in #2145
 has a start version of 0.4.0, this update aligns the version accordingly.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2153 from cfmcgrady/celeborn-1160-followup.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-13 15:12:50 +08:00
zky.zhoukeyong
92bebd305d [CELEBORN-1160] Avoid calling parmap when commit files
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
One user reported that LifecycleManager's parmap can create huge number of threads and causes OOM.

![image](https://github.com/apache/incubator-celeborn/assets/948245/1e9a0b83-32fe-40d5-8739-2b370e030fc8)

There are four places where parmap is called:

1. When LifecycleManager commits files
2. When LifecycleManager reserves slots
3. When LifecycleManager setup connection to workers
4. When StorageManager calls close

This PR fixes the first one. To be more detail, this PR eliminates `parmap` when doing committing files, and also replaces `askSync` with `ask`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test and GA.

Closes #2145 from waitinfuture/1160.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-13 14:36:48 +08:00
wangshengjie
8516df4beb [CELEBORN-1151] Request slots when register shuffle should filter the workers excluded by application
### What changes were proposed in this pull request?
When request slots, filter workers excluded by application

### Why are the changes needed?
If worker alive but can not service, register shuffle will remove the worker from application client exclude list and next shuffle may reserve slots on this worker,this will cause application revive unexpectly

### Does this PR introduce _any_ user-facing change?
Yes, request slots will filter workers excluded by application

### How was this patch tested?
UT,

Closes #2131 from wangshengjie123/fix-request-slots-blacklist.

Authored-by: wangshengjie <wangshengjie3@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-12 10:02:18 +08:00
Erik.fang
87b64391ea [CELEBORN-1152] fix GetShuffleId RPC NPE for empty shuffle
### What changes were proposed in this pull request?

In [celeborn-955](https://github.com/apache/incubator-celeborn/pull/1924),  GetShuffleId RPC was introduced to generate a celeborn shuffle id from app shuffle id to support spark stage rerun
GetShuffleId RPC assumes that Shuffle Write operation always happens before Shuffle Read operation, but this is not true for empty shuffle data in celeborn, which causes GetShuffleId RPC to throw NPE and fail the Job
This PR fixes this bug

### Why are the changes needed?
to avoid spark job failure with empty shuffle data

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
a new test case is included for empty shuffle data

Closes #2136 from ErikFang/fix-GetShuffleId-RPC-NPE-for-empty-shuffle.

Lead-authored-by: Erik.fang <fmerik@gmail.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-11 20:13:26 +08:00
Chandni Singh
074597d05e [CELEBORN-1147] Added a dedicated API for RPC messages which also accepts an RpcResponseCallback instance
### What changes were proposed in this pull request?

Currently in `BaseMessageHandler` there is a single API for receive which is used for all messages. This makes handling messages when multiple handlers are added messy.

- req.body.release() is only invoked when the handler actually process the message and not delegates it.
- every handler will have to create an instance of RpcResponseCallback for Rpc messages which is exactly the same.

Instead, releasing the message body and creating a callback for Rpc messages can be done in TransportRequestHandler. This avoids:

- code duplication related to RpcResponseCallback in every RPC handler
- every new request handler doesn't need to release the request body. It will be always be done in TransportRequestHandler.

Please note that this is how it is in Apache Spark and with Sasl Authentication, we will add a SaslRpcHandler (https://github.com/apache/incubator-celeborn/pull/2105) which wraps the underlying message handler.

### Why are the changes needed?

The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UTs and added some more UTs.

Closes #2123 from otterc/CELEBORN-1147.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-09 02:02:15 +08:00
onebox-li
af6fd8a0e6 [CELEBORN-1127] Add JVM classloader metrics
### What changes were proposed in this pull request?
Add JVM classloader metrics for loaded and unloaded count.
![image](https://github.com/apache/incubator-celeborn/assets/19429353/c00eceb3-54e5-4f85-8df1-fe9a6adf6ad4)

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
Add two classloader-related panels.

### How was this patch tested?
Cluster test.

Closes #2099 from onebox-li/add-classloader-metrics.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-07 09:47:23 +08:00
qinrui
04a1e90207 [CELEBORN-1122] Metrics supports json format
### What changes were proposed in this pull request?
If the user does not use prometheus to collect monitoring metrics, but rather some other ones. Using metrics in JSON format would be more user-friendly.The PR supports JSON format for metrics.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
Metrics supports JSON format

### How was this patch tested?
Cluster test.

Closes #2089 from suizhe007/CELEBORN-1122.

Authored-by: qinrui <qr7972@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-12-06 09:24:28 +08:00