Commit Graph

617 Commits

Author SHA1 Message Date
sychen
9f8a6bcb79 [CELEBORN-1190][FOLLOWUP] Apply error prone patch and suppress some problems
### What changes were proposed in this pull request?
1. Fix IntLongMath, InconsistentCapitalization, UnnecessaryAssignment
2. disable StringSplitter, EmptyBlockTag, EqualsGetClass, MissingSummary, BadImport

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

```bash
./build/make-distribution.sh --release
```

PR

total  202

```
  43 SynchronizeOnNonFinalField
  24 StaticAssignmentInConstructor
  21 JdkObsolete
  16 ThreadLocalUsage
  16 MutableConstantField
  16 MissingCasesInEnumSwitch
  11 UnusedMethod
  11 NonAtomicVolatileUpdate
   8 UnusedNestedClass
   8 NonOverridingEquals
   8 Finally
   4 MixedMutabilityReturnType
   4 DoubleBraceInitialization
   4 CatchAndPrintStackTrace
   4 CanonicalDuration
   2 ReferenceEquality
   1 ClassCanBeStatic
   1 ByteBufferBackingArray
```

Closes #2180 from cxzl25/error_prone_patch_followup.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-03 15:59:50 +08:00
zwangsheng
d1b30b2827 [CELEBORN-1208][WORKER] Unify parse uniqueId to WorkerInfo
### What changes were proposed in this pull request?

Unify parse uniqueId to WorkerInfo

### Why are the changes needed?

Keep parse uniqueId behavior consistent and avoid multiple changes

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

Closes #2202 from zwangsheng/CELEBORN-1208.

Authored-by: zwangsheng <binjieyang@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-02 21:27:58 +08:00
mingji
7be05b430b [CELEBORN-1133] Refactor fileinfo
### What changes were proposed in this pull request?
Rename FileWriter to PartitionLocationDataWriter, add storageManager, delete fileinfo, and flusher in the constructor.

FileInfo(userIdentifier,partitionSplitEnabled,fileMeta)
– NonMemoryFileInfo(streams,filePath,storageType,bytesFlushed)
– MemoryFileInfo(length,buffer)

FileMeta
– reduceFileMeta(chunkOffsets,sorted)
– mapFileMeta(bufferSize,numSubPartitions)

### Why are the changes needed?
1. To make concepts more clear.
2. To support memory storage and HDFS slot management.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster test with worker kill.

Closes #2130 from FMX/b1133.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-02 21:26:10 +08:00
Cheng Pan
77e468161d [CELEBORN-891] Remove pipeline feature for sort based writer
### What changes were proposed in this pull request?

Remove pipeline feature for sort based writer

### Why are the changes needed?

The pipeline feature is added as part of CELEBORN-295, for performance. Eventually, an unresolvable issue that would crash the JVM was identified in https://github.com/apache/incubator-celeborn/pull/1807, and after discussion, we decided to delete this feature.

### Does this PR introduce _any_ user-facing change?

No, the pipeline feature is disabled by default, there are no changes to users who use the default settings.

### How was this patch tested?

Pass GA.

Closes #2196 from pan3793/CELEBORN-891.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2024-01-01 10:42:17 +08:00
liangyongyuan
f8eb1605a1 [CELEBORN-1036][FOLLOWUP] When inflightBatchesPerAddress clear, totalInflightReqs should reset
### What changes were proposed in this pull request?
When   `inflightBatchesPerAddress`  clear in  `InFlightRequestTracker.cleanup `, `totalInflightReqs` should also reset to avoid getting stuck when exiting.

### Why are the changes needed?
`inflightBatchesPerAddress` has cleared and be empty,but totalInflightReqs is always bigger than 0.
![image](https://github.com/apache/incubator-celeborn/assets/46274164/28223f1e-ac9b-4e0b-a26d-9b529af6bca1)

This occurred during the first attempt of the task, where the request for map end failed, but the driver marked that the map has already ended.
![image](https://github.com/apache/incubator-celeborn/assets/46274164/7f43d808-2f9b-4775-b04f-30afe4d31e5a)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
through exists uts

Closes #2191 from lyy-pineapple/celebron-1036.

Authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-27 16:10:12 +08:00
SteNicholas
e7e39a51be
[CELEBORN-1189] Introduce RunningApplicationCount metric and /applications API to record running applications of worker
### What changes were proposed in this pull request?

Introduce `RunningApplicationCount` metric and `/applications` API to record running applications for Celeborn worker.

### Why are the changes needed?

`RunningApplicationCount` metrics only monitors the count of running applications in the cluster for master. Meanwhile, `/listTopDiskUsedApps` API lists the top disk usage application ids for master and worker. Therefore `RunningApplicationCount` metric and `/applications` API could be introduced to record running applications of worker.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2172 from SteNicholas/CELEBORN-1189.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-27 09:51:16 +08:00
Angerszhuuuu
f751df50ba [CELEBORN-1192][BUG] Celeborn wait task timeout error message should show correct corresponding batch and target host and port
### What changes were proposed in this pull request?
Celeborn wait task timeout error message should show correct corresponding batch and target host and port

### Why are the changes needed?
Current error log here is confused, can't found out the target hostAndPushPort that have problem.

### Does this PR introduce _any_ user-facing change?
Refactor log help debug

### How was this patch tested?

Closes #2183 from AngersZhuuuu/CELEBORN-1192.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-12-22 16:53:30 +08:00
liangyongyuan
08e7b5962b [CELEBORN-1193] ResettableSlidingWindowReservoir should reset full to false
### What changes were proposed in this pull request?
When ResettableSlidingWindowReservoir reset,  it should reset` full` to `false`, `index` to `0`

### Why are the changes needed?
The ResettableSlidingWindowReservoir class, after invoking the reset operation, resets the data to zero, but fails to reset the 'index' and 'full' variables. Consequently, when retrieving a snapshot in the next operation, it is possible to obtain a considerable amount of zeros. This issue extends to the inaccurate calculation of metrics such as average and minimum values.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add uts

Closes #2182 from lyy-pineapple/slide-bug.

Lead-authored-by: liangyongyuan <2081248500@qq.com>
Co-authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-21 19:53:47 +08:00
liangyongyuan
4304be1a60 [CELEBORN-1172][SPARK] Support dynamic switch shuffle push write mode based on partition number
### What changes were proposed in this pull request?
Dynamically determine the writing mode in Spark based on the number of partitions.

### Why are the changes needed?
Enhance the flexibility of shuffle writes to improve performance.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add uts

Closes #2160 from lyy-pineapple/dynamic-write-mode.

Lead-authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-21 16:58:51 +08:00
zwangsheng
6c2fdf7477
[CELEBORN-1188][TEST] Using JUnit function instead of java assert
### What changes were proposed in this pull request?
Using Junit function instead of java assert.

### Why are the changes needed?
When java assert fail, will throw AssertException, which is hard to find diff.

![截屏2023-12-20 10 34 52](https://github.com/apache/incubator-celeborn/assets/52876270/b36421a5-64e1-4717-a6d4-3b08db403293)

Instead, when we use junit assert, we can clearly find diff.

![截屏2023-12-20 11 17 21](https://github.com/apache/incubator-celeborn/assets/52876270/ce39fa20-e9ab-4419-a4ca-62c4157e4b2c)

### Does this PR introduce _any_ user-facing change?
NO, only test changed

### How was this patch tested?
Run CI

Closes #2173 from zwangsheng/CELEBORN-1188.

Authored-by: zwangsheng <binjieyang@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-12-20 21:20:38 +08:00
sychen
7f653ce7d6 [CELEBORN-1190] Apply error prone patch and suppress some problems
### What changes were proposed in this pull request?
1.  Fix MissingOverride, DefaultCharset, UnnecessaryParentheses Rule
2. Exclude generated sources, FutureReturnValueIgnored, TypeParameterUnusedInFormals, UnusedVariable

### Why are the changes needed?
```
./build/make-distribution.sh --release
```
We get a lot of WARNINGs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

Closes #2177 from cxzl25/error_prone_patch.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-12-20 20:54:18 +08:00
SteNicholas
089a0f8686
[CELEBORN-1036][FOLLOWUP] totalInflightReqs should decrement when batchIdSet contains the batchId to avoid duplicate caller of removeBatch
### What changes were proposed in this pull request?

`totalInflightReqs` decrements when `batchIdSet` contains the `batchId` to avoid duplicate caller of `removeBatch` in `InFlightRequestTracker`.

### Why are the changes needed?

Caller of `InFlightRequestTracker#removeBatch` may be duplicated, which cause that `totalInflightReqs` could be negative. The source of truth should be that `totalInflightReqs` should decrement when `batchIdSet` contains the `batchId`. If `batchIdSet` does not contain the `batchId`, it does not need to decrement `totalInflightReqs`.

```
23/12/05 20:05:01 [Executor task launch worker for task 17.0 in stage 10.0 (TID 206)] ERROR InFlightRequestTracker: After waiting for 1200000 ms, there are still -1 batches in flight for hostAndPushPort [], which exceeds the current limit 0.
23/12/05 20:05:01 [Executor task launch worker for task 17.0 in stage 10.0 (TID 206)] WARN InFlightRequestTracker: Clear InFlightRequestTracker
23/12/05 20:05:01 [Executor task launch worker for task 17.0 in stage 10.0 (TID 206)] ERROR Executor: Exception in task 17.0 in stage 10.0 (TID 206)
org.apache.celeborn.common.exception.CelebornIOException: Waiting timeout for task 4-17-0 while limiting zero in-flight requests
	at org.apache.celeborn.client.ShuffleClientImpl.limitZeroInFlight(ShuffleClientImpl.java:598)
	at org.apache.celeborn.client.ShuffleClientImpl.prepareForMergeData(ShuffleClientImpl.java:1175)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.close(HashBasedShuffleWriter.java:455)
	at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.write(HashBasedShuffleWriter.java:210)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:589)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1545)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:594)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2134 from SteNicholas/CELEBORN-1036.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-20 18:11:07 +08:00
mingji
4dacf72a6d
[CELEBORN-1150] support io encryption for spark
### What changes were proposed in this pull request?
1. To support io encryption for spark.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and manually test on a cluster.

Closes #2135 from FMX/B1150.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-12-19 11:44:05 +08:00
Chandni Singh
b09febdd8c [CELEBORN-1176] Server side support for Sasl Auth
### What changes were proposed in this pull request?

This adds the server side Sasl authentication support in the transport layer. Most of this code is taken from Apache Spark.

### Why are the changes needed?

The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UTs.

Closes #2164 from otterc/CELEBORN-1176.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-18 11:27:28 +08:00
zky.zhoukeyong
e361788e48 [CELEBORN-1178] Destroy fail reserved slots in LifecycleManager#reserveSlotsWithRetry
### What changes were proposed in this pull request?
I'm testing main branch and encountered the following scenario.
I run `sbin/stop-worker.sh` near simultaneously on 3 out of 6 workers, and I'm expecting the 3 workers
will soon shutdown because I enabled graceful shutdown. However, only the first worker I stopped
shutdown in 15s as expected, the other two won't shutdown until shutdown timeout.

After digging into it, I found `LifecycleManager#reserveSlotsWithRetry` will reserve for the same location
twice:
1. At T1, only worker1 shutdown, pushes receive HARD_SPLIT and goes to revive
2. At T2, LifecycleManager handles revive requests in batch, and try to reallocate the locs to other workers
3. At T3, reserve to worker3 succeeds because it's not shutdown yet, but reserve to worker2 fails because it's shutdown
4. At T4, LifecycleManager will re-allocate the failed slots to other workers except worker1 and worker2. However, at this time Worker3 is also shutdown, so it fails to reserve on worker3
5. At T5, it re-allocates slots that failed to worker3. However, `getFailedPartitionLocations` will return slots allocated to worker3 in step 3, and increment the epoch to 2. At this time, worker3 has slots of epoch 1, but they will never to pushed to because newer epoch 3 is generated at the same time
6. Since the epoch 2 locs in worker3 will never be pushed to, it will never get a chance to return HARD_SPLIT, as a result it can't fast shutdown untile timeout.

This PR fixes this by destroying failed to be reserved slots in the process of `reserveSlotsWithRetry`

### Why are the changes needed?
ditto

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test.

Before:
![image](https://github.com/apache/incubator-celeborn/assets/948245/50c55524-d37f-494e-a5aa-fba682438cda)
After:
![image](https://github.com/apache/incubator-celeborn/assets/948245/8c90a869-b388-46f3-a86b-a37fd0f4ce0f)

Closes #2163 from waitinfuture/1178.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-17 14:28:04 +08:00
Chandni Singh
600bd53616 [CELEBORN-1180] Changed the version of Sasl Auth related config to 0.5
### What changes were proposed in this pull request?
Changes the version of the config to 0.5 given that 0.4 will be released soon.

### Why are the changes needed?
See above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
NA

Closes #2165 from otterc/CELEBORN-1180.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-16 13:45:46 +08:00
zky.zhoukeyong
309153a99b [CELEBORN-1175] Add UT for commit files
### What changes were proposed in this pull request?
As title.

### Why are the changes needed?
As title.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Passes UTs.

Closes #2162 from waitinfuture/1175-2.

Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-16 01:36:29 +08:00
zky.zhoukeyong
01feb93abb [CELEBORN-1167] Avoid calling parmap when destroy slots
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
One user reported that LifecycleManager's parmap can create huge number of threads and causes OOM.

![image](https://github.com/apache/incubator-celeborn/assets/948245/1e9a0b83-32fe-40d5-8739-2b370e030fc8)

There are four places where parmap is called:

1. When LifecycleManager commits files
2. When LifecycleManager reserves slots
3. When LifecycleManager setup connection to workers
4. When LifecycleManager call destroy slots

This PR fixes the fourth one. To be more detail, this PR eliminates `parmap` when destroying slots, and also replaces `askSync` with `ask`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test and GA.

Closes #2156 from waitinfuture/1167.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-15 18:30:31 +08:00
Chandni Singh
a03ce6c165 [CELEBORN-1157] Add client-side support for Sasl Authentication in the transport layer
### What changes were proposed in this pull request?
This adds the client side Sasl authentication support in the transport layer. Most of this code is taken from Apache Spark.

### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
Added a configuration for Sasl request timeout

### How was this patch tested?
Will be adding `CelebornSaslSuiteJ.java` (https://github.com/apache/incubator-celeborn/pull/2105) that tests the end-to-end Sasl flow.

Closes #2139 from otterc/CELEBORN-1157.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-14 22:52:49 +08:00
Fu Chen
0f2a9a3a63 [CELEBORN-1160][FOLLOWUP] Update the version for celeborn.client.rpc.shared.threads to 0.3.2
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

Since we are backporting #2145 to branch-0.3, and the configuration entry `celeborn.client.rpc.shared.threads` in #2145
 has a start version of 0.4.0, this update aligns the version accordingly.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2153 from cfmcgrady/celeborn-1160-followup.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-13 15:12:50 +08:00
zky.zhoukeyong
92bebd305d [CELEBORN-1160] Avoid calling parmap when commit files
### What changes were proposed in this pull request?
As title

### Why are the changes needed?
One user reported that LifecycleManager's parmap can create huge number of threads and causes OOM.

![image](https://github.com/apache/incubator-celeborn/assets/948245/1e9a0b83-32fe-40d5-8739-2b370e030fc8)

There are four places where parmap is called:

1. When LifecycleManager commits files
2. When LifecycleManager reserves slots
3. When LifecycleManager setup connection to workers
4. When StorageManager calls close

This PR fixes the first one. To be more detail, this PR eliminates `parmap` when doing committing files, and also replaces `askSync` with `ask`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test and GA.

Closes #2145 from waitinfuture/1160.

Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-13 14:36:48 +08:00
wangshengjie
8516df4beb [CELEBORN-1151] Request slots when register shuffle should filter the workers excluded by application
### What changes were proposed in this pull request?
When request slots, filter workers excluded by application

### Why are the changes needed?
If worker alive but can not service, register shuffle will remove the worker from application client exclude list and next shuffle may reserve slots on this worker,this will cause application revive unexpectly

### Does this PR introduce _any_ user-facing change?
Yes, request slots will filter workers excluded by application

### How was this patch tested?
UT,

Closes #2131 from wangshengjie123/fix-request-slots-blacklist.

Authored-by: wangshengjie <wangshengjie3@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-12 10:02:18 +08:00
Erik.fang
87b64391ea [CELEBORN-1152] fix GetShuffleId RPC NPE for empty shuffle
### What changes were proposed in this pull request?

In [celeborn-955](https://github.com/apache/incubator-celeborn/pull/1924),  GetShuffleId RPC was introduced to generate a celeborn shuffle id from app shuffle id to support spark stage rerun
GetShuffleId RPC assumes that Shuffle Write operation always happens before Shuffle Read operation, but this is not true for empty shuffle data in celeborn, which causes GetShuffleId RPC to throw NPE and fail the Job
This PR fixes this bug

### Why are the changes needed?
to avoid spark job failure with empty shuffle data

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
a new test case is included for empty shuffle data

Closes #2136 from ErikFang/fix-GetShuffleId-RPC-NPE-for-empty-shuffle.

Lead-authored-by: Erik.fang <fmerik@gmail.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-11 20:13:26 +08:00
Chandni Singh
074597d05e [CELEBORN-1147] Added a dedicated API for RPC messages which also accepts an RpcResponseCallback instance
### What changes were proposed in this pull request?

Currently in `BaseMessageHandler` there is a single API for receive which is used for all messages. This makes handling messages when multiple handlers are added messy.

- req.body.release() is only invoked when the handler actually process the message and not delegates it.
- every handler will have to create an instance of RpcResponseCallback for Rpc messages which is exactly the same.

Instead, releasing the message body and creating a callback for Rpc messages can be done in TransportRequestHandler. This avoids:

- code duplication related to RpcResponseCallback in every RPC handler
- every new request handler doesn't need to release the request body. It will be always be done in TransportRequestHandler.

Please note that this is how it is in Apache Spark and with Sasl Authentication, we will add a SaslRpcHandler (https://github.com/apache/incubator-celeborn/pull/2105) which wraps the underlying message handler.

### Why are the changes needed?

The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UTs and added some more UTs.

Closes #2123 from otterc/CELEBORN-1147.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-09 02:02:15 +08:00
onebox-li
af6fd8a0e6 [CELEBORN-1127] Add JVM classloader metrics
### What changes were proposed in this pull request?
Add JVM classloader metrics for loaded and unloaded count.
![image](https://github.com/apache/incubator-celeborn/assets/19429353/c00eceb3-54e5-4f85-8df1-fe9a6adf6ad4)

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
Add two classloader-related panels.

### How was this patch tested?
Cluster test.

Closes #2099 from onebox-li/add-classloader-metrics.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-12-07 09:47:23 +08:00
qinrui
04a1e90207 [CELEBORN-1122] Metrics supports json format
### What changes were proposed in this pull request?
If the user does not use prometheus to collect monitoring metrics, but rather some other ones. Using metrics in JSON format would be more user-friendly.The PR supports JSON format for metrics.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
Metrics supports JSON format

### How was this patch tested?
Cluster test.

Closes #2089 from suizhe007/CELEBORN-1122.

Authored-by: qinrui <qr7972@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-12-06 09:24:28 +08:00
SteNicholas
406cef8392 [CELEBORN-1052][FOLLOWUP] Introduce dynamic ConfigService at SystemLevel and TenantLevel
### What changes were proposed in this pull request?

Follow up #2100. Mainly changes the package from scala to java of the codes in #2100. Meanwhile, `FsConfigServiceImpl#refresh` should directly return instead of refreshing configs.

### Why are the changes needed?

This PR follow up dynamic `ConfigService` at `SystemLevel` and `TenantLevel`, Dynamic configuration is a type of configuration that can be changed at runtime as needed in #2100. The implementation of `ConfigService` is based on Java codes, which are put into Scala package and cause that the spotless plugin does not format well. After the changes of the pull request, there are much code style changes generated from the package moving behavior.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`ConfigServiceSuiteJ`.

Closes #2125 from SteNicholas/CELEBORN-1052.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-12-04 19:03:59 +08:00
exmy
8a15396cb6 [CELEBORN-1145] Separate clientPushBufferMaxSize from CelebornInputStreamImpl
### What changes were proposed in this pull request?
The `clientPushBufferMaxSize` config is also used by `CelebornInputStreamImpl`, it's a config about push side and should not be used by fetch side. This pr introduces a fetch config to replace it.

### Why are the changes needed?

As above

### Does this PR introduce _any_ user-facing change?

Yes, a new config `celeborn.client.fetch.buffer.size` is introduced.

### How was this patch tested?

Pass CI

Closes #2118 from exmy/celeborn-1145.

Authored-by: exmy <xumovens@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-11-30 18:56:03 +08:00
Chandni Singh
d37bb4c682 [CELEBORN-1131] Add Client/Server bootstrap framework to transport layer
### What changes were proposed in this pull request?
This adds the client/server bootstrap framework to transport layer in Celeborn. This is copied from Spark.
It is part of the epic: https://issues.apache.org/jira/browse/CELEBORN-1011.

### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Part of a larger change which has tests

Closes #2120 from otterc/CELEBORN-1131-PR1.

Lead-authored-by: Chandni Singh <singh.chandni@gmail.com>
Co-authored-by: otterc <singh.chandni@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-29 21:06:23 +08:00
SteNicholas
4dfcd9b56b [CELEBORN-1092] Introduce JVM monitoring in Celeborn Worker using JVMQuake
### What changes were proposed in this pull request?

Introduce JVM monitoring in Celeborn Worker using JVMQuake to enable early detection of memory management issues and facilitate fast failure.

### Why are the changes needed?

When facing out-of-control memory management in Celeborn Worker we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered. Therefore JVMQuake is introduced to provide more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure. Refers to the principle of [jvmquake](https://github.com/Netflix-Skunkworks/jvmquake) which is a JVMTI agent that attaches to your JVM and automatically signals and kills it when the program has become unstable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`JVMQuakeSuite`

Closes #2061 from SteNicholas/CELEBORN-1092.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-28 20:45:08 +08:00
mingji
113311df3e [CELEBORN-1081][FOLLOWUP] Remove UNKNOWN_DISK and allocate all slots to disk
### What changes were proposed in this pull request?
1. Remove UNKNOWN_DISK from StorageInfo.
2. Enable load-aware slots allocation when there is HDFS.

### Why are the changes needed?
To support the application's config about available storage types.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
GA and Cluster.

Closes #2098 from FMX/B1081-1.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-28 11:26:00 +08:00
Shuang
ad57c8b91e
[CELEBORN-1052] Introduce dynamic ConfigService at SystemLevel and TenantLevel
### What changes were proposed in this pull request?
This PR introduce dynamic ConfigService at SystemLevel and TenantLevel, Dynamic configuration is a type of configuration that can be changed at runtime as needed. It can be used at system level/tenant level. When applying dynamic configuration, the priority order is as follows: tenant level overrides system level, which in turn overrides static configuration(CelebornConf). This means that if a configuration is defined at the tenant level, it will be used instead of the system level or static configuration(CelebornConf). If the tenant-level configuration is missing,
the system-level configuration will be used. If the system-level configuration is also missing, CelebornConf
will be used as the default value.

There are several other tasks related to this feature that will be implemented in the future.

- [ ]  [Add isDynamic property for CelebornConf](https://issues.apache.org/jira/browse/CELEBORN-1051)
- [ ]  [Support DB based Configserver](https://issues.apache.org/jira/browse/CELEBORN-1054)
- [ ]  [Add restAPI for configuration management](https://issues.apache.org/jira/browse/CELEBORN-1056)

### Why are the changes needed?
The current configuration of the server (CelebornConf) is static. When the configuration is changed, the service needs to be restarted. This PR introduces a dynamic configuration solution. The server side can use dynamic configuration as needed. At the same time, it is considered that the tenant level will be supported in the future (such as supporting tenant level dynamic quota control) configuration, so this time we will also consider supporting dynamic tenant-level configuration, and this PR will provide a default implementation based on the file system.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #2100 from RexXiong/CELEBORN-1052.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-27 12:17:05 +08:00
Erik.fang
aee41555c6 [CELEBORN-955] Re-run Spark Stage for Celeborn Shuffle Fetch Failure
### What changes were proposed in this pull request?
Currently, Celeborn uses replication to handle shuffle data lost for celeborn shuffle reader, this PR implements an alternative solution by Spark stage resubmission.

Design doc:
https://docs.google.com/document/d/1dkG6fww3g99VAb1wkphNlUES_MpngVPNg8601chmVp8/edit

### Why are the changes needed?
Spark stage resubmission uses less resources compared with replication, and some Celeborn users are also asking for it

### Does this PR introduce _any_ user-facing change?
a new config celeborn.client.fetch.throwsFetchFailure is introduced to enable this feature

### How was this patch tested?
two UTs are attached, and we also tested it in Ant Group's Dev spark cluster

Closes #1924 from ErikFang/Re-run-Spark-Stage-for-Celeborn-Shuffle-Fetch-Failure.

Lead-authored-by: Erik.fang <fmerik@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-26 16:47:58 +08:00
Chandni Singh
788b0c340b [CELEBORN-1135] Added tests for the RpcEnv and related classes
### What changes were proposed in this pull request?
Added test suites for `RpcEnv`, `NettyRpcEnv`, and other related classes.
These are copied over from Apache Spark. Some of the UTs in Apache Spark required changes in the source code like [SPARK-39468](https://issues.apache.org/jira/browse/SPARK-39468) which I didn't copy over.

### Why are the changes needed?
The change adds unit tests.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Just adds UTs. The source code changes are minimal.

Closes #2107 from otterc/CELEBORN-1135.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-24 09:57:04 +08:00
SteNicholas
60871750e4
[CELEBORN-1136] Support policy for master to assign slots fallback to roundrobin with no available slots
### What changes were proposed in this pull request?

`SlotsAllocator` supports policy for master to assign slots fallback to roundrobin with no available slots.

### Why are the changes needed?

When the selected workers have no available slots, the loadaware policy could throw `MasterNotLeaderException`. It's recommended to support policy for master to assign slots fallback to roundrobin with no available slots. Meanwhile, the situation that there is no available slots would occur when the partition size has increased a lot in a short period of time.
```
Caused by: org.apache.celeborn.common.haclient.MasterNotLeaderException: Master:xx.xx.xx.xx:9099 is not the leader. Suggested leader is Master:xx.xx.xx.xx:9099. Exception:bound must be positive.
    at org.apache.celeborn.service.deploy.master.clustermeta.ha.HAHelper.sendFailure(HAHelper.java:58)
    at org.apache.celeborn.service.deploy.master.Master.executeWithLeaderChecker(Master.scala:236)
    at org.apache.celeborn.service.deploy.master.Master$$anonfun$receiveAndReply$1.applyOrElse(Master.scala:314)
    ... 7 more
Caused by: java.lang.IllegalArgumentException: bound must be positive
    at java.util.Random.nextInt(Random.java:388)
    at org.apache.celeborn.service.deploy.master.SlotsAllocator.roundRobin(SlotsAllocator.java:202)
    at org.apache.celeborn.service.deploy.master.SlotsAllocator.offerSlotsLoadAware(SlotsAllocator.java:151)
    at org.apache.celeborn.service.deploy.master.Master.$anonfun$handleRequestSlots$1(Master.scala:598)
    at org.apache.celeborn.common.metrics.source.AbstractSource.sample(AbstractSource.scala:199)
    at org.apache.celeborn.common.metrics.source.AbstractSource.sample(AbstractSource.scala:189)
    at org.apache.celeborn.service.deploy.master.Master.handleRequestSlots(Master.scala:587)
    at org.apache.celeborn.service.deploy.master.Master$$anonfun$receiveAndReply$1.$anonfun$applyOrElse$12(Master.scala:314)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.celeborn.service.deploy.master.Master.executeWithLeaderChecker(Master.scala:233)
    ... 8 more
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`SlotsAllocatorSuiteJ#testAllocateSlotsWithNoAvailableSlots`

Closes #2108 from SteNicholas/CELEBORN-1136.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-22 14:08:06 +08:00
SteNicholas
a275b64b32
[CELEBORN-1137] Correct suggested leader of exception message for MasterNotLeaderException
### What changes were proposed in this pull request?

`MasterNotLeaderException` corrects the suggested leader of exception message.

### Why are the changes needed?

When current peer isn't the leader of master and the leader is switching which cache isn't expired, the suggested leader of exception message in MasterNotLeaderException is confusing that the suggested leader is current peer. It's recommened to correct suggested leader of exception message for MasterNotLeaderException if current peer is equal to the suggested leader.
```
Caused by: org.apache.celeborn.common.haclient.MasterNotLeaderException: Master:xx.xx.xx.xx:9099 is not the leader. Suggested leader is Master:xx.xx.xx.xx:9099. Exception:bound must be positive.
    at org.apache.celeborn.service.deploy.master.clustermeta.ha.HAHelper.sendFailure(HAHelper.java:58)
    at org.apache.celeborn.service.deploy.master.Master.executeWithLeaderChecker(Master.scala:236)
    at org.apache.celeborn.service.deploy.master.Master$$anonfun$receiveAndReply$1.applyOrElse(Master.scala:314)
    ... 7 more
Caused by: java.lang.IllegalArgumentException: bound must be positive
    at java.util.Random.nextInt(Random.java:388)
    at org.apache.celeborn.service.deploy.master.SlotsAllocator.roundRobin(SlotsAllocator.java:202)
    at org.apache.celeborn.service.deploy.master.SlotsAllocator.offerSlotsLoadAware(SlotsAllocator.java:151)
    at org.apache.celeborn.service.deploy.master.Master.$anonfun$handleRequestSlots$1(Master.scala:598)
    at org.apache.celeborn.common.metrics.source.AbstractSource.sample(AbstractSource.scala:199)
    at org.apache.celeborn.common.metrics.source.AbstractSource.sample(AbstractSource.scala:189)
    at org.apache.celeborn.service.deploy.master.Master.handleRequestSlots(Master.scala:587)
    at org.apache.celeborn.service.deploy.master.Master$$anonfun$receiveAndReply$1.$anonfun$applyOrElse$12(Master.scala:314)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.celeborn.service.deploy.master.Master.executeWithLeaderChecker(Master.scala:233)
    ... 8 more
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2109 from SteNicholas/CELEBORN-1137.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-22 10:26:53 +08:00
Fu Chen
aab073ab16
[CELEBORN-1125] Bump guava from 14.0.1 to 32.1.3-jre
### What changes were proposed in this pull request?

As title

### Why are the changes needed?

- bump guava from 14.0.1 to 32.1.3-jre
- refer to https://github.com/apache/spark/pull/26911, remove usages of Guava that no longer work in Guava 27/32, and replace with workalikes. After this PR, Celeborn no longer relies on a specific version of Guava, and is compatible with Guava 14/27/32. we have the ability to specify Guava to 27 when running MapReduce integration tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #2090 from cfmcgrady/guava-27.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-21 16:18:14 +08:00
onebox-li
b5c5aa6d9d [CELEBORN-1121] Improve WorkerInfo#hashCode method
### What changes were proposed in this pull request?
Change WorkerInfo#hashCode() from map+foldLeft to while and cache.

Test the each way to calculate, code and result show as below:
```
val state = Seq(host, rpcPort, pushPort, fetchPort, replicatePort)
// origin
val originHash = state.map(_.hashCode()).foldLeft(0)((a, b) => 31 * a + b)

// for
var forHash = 0
for (i <- state) {
  forHash = 31 * forHash + i.hashCode()
}

// while
var whileHash = 0
var i = 0
while (i < state.size) {
  whileHash = 31 * whileHash + state(i).hashCode()
  i = i + 1
}
```
Result:
```
java version "1.8.0_261"
origin hash result = -831724440, costs 1103914 ns
for hash result = -831724440, costs 444588 ns (2.5x)
while hash result = -831724440, costs 46510 ns (23x)
```

### Why are the changes needed?
The current WorkerInfo's hashCode() is a little time-consuming. Since it is widely used in lots of hash maps, it needs to be improved.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #2086 from onebox-li/improve-worker-hash.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
2023-11-17 10:31:57 +08:00
mingji
02cea042a0 [CELEBORN-1116] Read authentication configs from HADOOP_CONF_DIR
### What changes were proposed in this pull request?
1. Make Celeborn read configs from HADOOP_COND_DIR.
2. Remove unnecessary Kerberos configs.

### Why are the changes needed?
To support HDFS with Kerberos.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #2082 from FMX/B1116.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-09 11:07:13 +08:00
Shuang
931880a82d [CELEBORN-1112] Inform celeborn application is shutdown, then celeborn cluster can release resource immediately
### What changes were proposed in this pull request?
Unregister application to Celeborn master After Application stopped, then master will expire the related shuffle resource immediately, resulting in resource savings.

### Why are the changes needed?
Currently Celeborn master expires the related application shuffle resource only when application is being checked timeout,
this would greatly delay the release of resources, which is not conducive to saving resources.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
PASS GA

Closes #2075 from RexXiong/CELEBORN-1112.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-08 20:46:51 +08:00
onebox-li
b7e4dc4339 [CELEBORN-1114] Remove allocationBuckets from WorkerInfo and refactor SLOTS_ALLOCATED metrics
### What changes were proposed in this pull request?
Currently, `WorkerInfo` is used in many places, and allocationBuckets is only used when its own workers want to collect metrics `SLOTS_ALLOCATED`. If there are lots of workers in the RSS cluster, there may be a certain amount of memory waste, each `WorkerInfo` maintain a Array\[Int](61), so remove it from `WorkerInfo`.
And refactor the metrics `SLOTS_ALLOCATED` from gauge to counter. Originally, this metrics is approximately one hour's total only if there are continuous tasks. Now refactoring it into a counter can reduce the cost of maintaining time windows, including storage and timely expiration data, etc. It can also be more flexibly transformed according to user needs on the prometheus side.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
Yes. metrics_SlotsAllocated_Count metrics change from gauge for 1 hour to a increasing counter.

### How was this patch tested?
Cluster test.

Closes #2078 from onebox-li/improve-SlotsAllocated.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-08 19:45:47 +08:00
SteNicholas
d2582919ad
[CELEBORN-1110] Support celeborn.worker.storage.disk.reserve.ratio to configure worker reserved ratio for each disk
### What changes were proposed in this pull request?

Support `celeborn.worker.storage.disk.reserve.ratio` to configure worker reserved ratio for each disk.

### Why are the changes needed?

`CelebornConf` supports to configure celeborn worker reserved space for each disk, which space is absolute. `CelebornConf` could support `celeborn.worker.storage.disk.reserve.ratio` to configure worker reserved ratio for each disk. The minimum usable size for each disk should be the max space between the reserved space and the space calculate via reserved ratio.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`SlotsAllocatorSuiteJ`

Closes #2071 from SteNicholas/CELEBORN-1110.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-08 12:39:25 +08:00
SteNicholas
52eddc59f3
[CELEBORN-448] Support exclude worker manually
### What changes were proposed in this pull request?

Support exclude worker manually given worker id. This worker is added into excluded workers manually.

### Why are the changes needed?

Celeborn supports to shuffle client-side fetch and push exclude workers on failure at present. It's necessary to exclude worker manually for maintaining the Celeborn cluster.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- `HttpUtilsSuite`
- `DefaultMetaSystemSuiteJ#testHandleWorkerExclude`
- `RatisMasterStatusSystemSuiteJ#testHandleWorkerExclude`
- `MasterStateMachineSuiteJ#testObjSerde`

Closes #1997 from SteNicholas/CELEBORN-448.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-07 16:25:24 +08:00
joey.ljy
455cd40137 [CELEBORN-1111] Supporting connection to HDFS with Kerberos authentication enabled
### What changes were proposed in this pull request?
Adding Kerberos support for HDFS storage type.

The following five parameters need to be configured:
| key | value |
| :--: | :--: |
| celeborn.storage.hdfs.kerberos.enabled | true |
| celeborn.storage.hdfs.kerberos.principal | userREALM |
| celeborn.storage.hdfs.kerberos.keytab | /path/test.keytab |
| celeborn.hadoop.hadoop.security.authorization | kerberos |
| celeborn.hadoop.dfs.namenode.kerberos.principal | hdfs/_HOSTREALM |

### Why are the changes needed?
Connecting to HDFS with Kerberos enabled requires support for keytab login.

### Does this PR introduce _any_ user-facing change?
Add 3 configurations.
celeborn.storage.hdfs.kerberos.enabled
celeborn.storage.hdfs.kerberos.principal
celeborn.storage.hdfs.kerberos.keytab

### How was this patch tested?
Test in Kerberos enabled HDFS cluster.

Closes #2072 from liujiayi771/hdfs-kerberos.

Authored-by: joey.ljy <joey.ljy@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-04 17:21:41 +08:00
mingji
5e77b851c9 [CELEBORN-1081] Client support celeborn.storage.activeTypes config
### What changes were proposed in this pull request?
1.To support `celeborn.storage.activeTypes` in Client.
2.Master will ignore slots for "UNKNOWN_DISK".

### Why are the changes needed?
Enable client application to select storage types to use.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
GA and cluster.

Closes #2045 from FMX/B1081.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
2023-11-03 20:03:11 +08:00
Chandni Singh
c8b5384baf [CELEBORN-1107] Make the max default number of netty threads configurable
### What changes were proposed in this pull request?
This change makes the maximum default number of Netty threads configurable. Previously, this value was hardcoded to 64, which could be small for certain environments. While it's possible to configure the number of Netty server and client threads individually for each module, providing an option to increase the default value offers greater convenience.

### Why are the changes needed?
The change offers convenience.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a UT

Closes #2065 from otterc/CELEBORN-1107.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-03 13:18:44 +08:00
onebox-li
7b185a2562 [CELEBORN-1058] Support specifying the number of dispatcher threads for each role
### What changes were proposed in this pull request?
Support specifying the number of dispatcher threads for each role, especially shuffle client side. For shuffle client, there is only RpcEndpointVerifier endpoint which handles not many requests, one thread is enough. The rpc env of other roles has only two endpoints at most, using a shared event loop is reasonable. I am not sure if there is a need to add rpc requests to shuffle client. So add specific parameters to specify the dispatcher threads here.

And change the dispatcher thread pool name in order to distinguish it from spark's.

### Why are the changes needed?
Ditto

### Does this PR introduce _any_ user-facing change?
Yes, add params celeborn.\<role>.rpc.dispatcher.threads

### How was this patch tested?
Manual test and UT

Closes #2003 from onebox-li/my_dev.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-03 10:35:54 +08:00
SteNicholas
4e8e8c2310
[CELEBORN-1094] Optimize mechanism of ChunkManager expired shuffle key cleanup to avoid memory leak
### What changes were proposed in this pull request?

The `cleaner` of `Worker` executes the `StorageManager#cleanupExpiredShuffleKey` to clean expired shuffle keys with daemon cached thread pool. The optimization speeds up cleaning including expired shuffle keys of ChunkManager to avoid memory leak.

### Why are the changes needed?

`ChunkManager#streams` could lead memory leak when the speed of cleanup is slower than expiration for expired shuffle of worker. The behavior that `ChunkStreamManager` cleanup expired shuffle key should be optimized to avoid memory leak, which causes that the VM thread of worker is 100%.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`WorkerSuite#clean up`.

Closes #2053 from SteNicholas/CELEBORN-1094.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-02 15:46:07 +08:00
SteNicholas
b45b63f9a5
[CELEBORN-247][FOLLOWUP] Add metrics for each user's quota usage of Celeborn Worker
### What changes were proposed in this pull request?

Add the metric `ResourceConsumption` to monitor each user's quota usage of Celeborn Worker.

### Why are the changes needed?

The metric `ResourceConsumption` supports to monitor each user's quota usage of Celeborn Master at present. The usage of Celeborn Worker also needs to monitor.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Internal tests.

Closes #2059 from SteNicholas/CELEBORN-247.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
2023-11-01 15:48:31 +08:00
onebox-li
320714bf24 [CELEBORN-1089] Seperate overHighWatermark check to a dedicated thread
### What changes were proposed in this pull request?
Seperate `overHighWatermark` check to a dedicated thread, let this value can shared better and lighten `CongestionController#isUserCongested` logic.

### Why are the changes needed?
Ditto.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test and UT.

Closes #2041 from onebox-li/congest-check.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-01 09:51:24 +08:00