### What changes were proposed in this pull request?
### Why are the changes needed?
`CELEBORN-1320` uses `ReviveManager` to batch processing SOFT_SPLIT event RPC, so `partitionSplitPool` is no longer used, and the configuration item `celeborn.client.push.splitPartition.threads` is meaningless.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2396 from cxzl25/CELEBORN-1336.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
To fix a typo.
### Why are the changes needed?
To maintain the quality of Celeborn documentation.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
N/A
Closes#2397 from ForVic/forvic/fix_typo.
Authored-by: ForVic <victor.lakers0@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix typo of `celeborn.network.bind.preferIpAddress` doc from `ture` to `true`.
### Why are the changes needed?
`celeborn.network.bind.preferIpAddress` doc has typo for `ture`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2392 from SteNicholas/prefer-ip-address.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
while SortBasedWriter has less memory footprint than HashBasedWriter, it suffers from performance issue when we have many partitions and the write buffer is filled with small chunks of data quickly
for example, if sort buffer size is 32K, you have 4 partitions and 128K data in total, the data distribution is like partition A, B, C, D, each time it comes with 8K per partition.... in this case, you need to compress and send small 8K chunk 4 times per partition , the cost would become very high. If you use hashbasedwriter, it doesn't have this problem since the push only happens when the per-partition buffer is full. Of course , larger sort buffer size can mitigate the issue, but tuning sort buffer size per job is a tedious work
this PR introduces a new feature that we measure total size of pushed bytes and pushed count as well as the "should-pushed" bytes and counts (should-push means that , the data we pushed is larger than CLIENT_PUSH_BUFFER_MAX_SIZE (in another word, we will trigger a push even with hashbasedwriter in this case))
when actualPushedBytes/actualPushedCounts > (1 + Threshold) * (ShouldPushBytes/ShouldPushCounts), we will enlarge the sort buffer size by 1X to try to buffer more data before pushing (the max size of sortBuffer would be capped at # of partitions * CLIENT_PUSH_BUFFER_MAX_SIZE)
### Why are the changes needed?
to reduce perf cost in sortbased writer
### Does this PR introduce _any_ user-facing change?
no, but have 2 extra configurations
### How was this patch tested?
in prod of our company and also unit test
Closes#2358 from CodingCat/adaptive_memory_threshold.
Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `ShutdownWorkerCount` metric to record the count of workers in shutdown list.
<img width="1432" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/bc84b281-30ca-40a1-92e4-fb9cf10b5aeb">
### Why are the changes needed?
`/shutdownWorkers` lists all shutdown workers of the master at present. Therefore it's recommended to introduce ShutdownWorkerCount metric to record the count of workers in shutdown list.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- [Celeborn Dashboard](https://stenicholas.grafana.net/public-dashboards/c44822917403401690edb15617ec9f08)
Closes#2379 from SteNicholas/CELEBORN-1323.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Rename `LostWorkers` metric to `LostWorkerCount` to align the naming style of other worker count metrics.
### Why are the changes needed?
The naming of `LostWorkers` metric is different from other metric of `MasterSource` like `WorkerCount`, `ExcludedWorkerCount` etc, which could be renamed to `LostWorkerCount` to align the naming style.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2378 from SteNicholas/CELEBORN-1322.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
This enables a Celeborn Worker to retrieve the application meta from the Master if it hasn't received the secret from the Master before the application attempts to connect to it. Additionally, the Celeborn Worker's SecretRegistry has been converted into an LRU cache to prevent unbounded growth of the registry.
### Why are the changes needed?
This is last change needed for Auth support in Celeborn (https://issues.apache.org/jira/browse/CELEBORN-1011)
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added UTs and part of a bigger change which will be tested end-to-end.
Closes#2363 from otterc/CELEBORN-1179.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix style and Gluten link in Developers Doc.
### Why are the changes needed?
- `slotsallocation.md` has the following wrong style:
<img width="1434" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/97fb53ed-473d-4f3d-8231-1fb613df9132">
- Gluten is apache incubating projetc, of which the link of Gluten project should be [Gluten](https://github.com/apache/incubator-gluten).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2375 from SteNicholas/developers-doc.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix license style of `quota_management.md`.
### Why are the changes needed?
The license style of `quota_management.md` is wrong.
<img width="1438" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/4a00724d-5fec-4b25-b134-d814c3152efd">
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2374 from SteNicholas/CELEBORN-1284.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
This enables Celeborn Master to persist application meta in Ratis and also push it to Celeborn Workers when it receives the requests for slots from the LifecycleManager.
### Why are the changes needed?
This change is required for adding authentication. ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)).
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added some UTs.
Closes#2346 from otterc/CELEBORN-1234.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Add `execution.batch-shuffle-mode: ALL_EXCHANGES_BLOCKING` to `Flink Configuration` of `Deploy Flink client` in `deploy.md`
### Why are the changes needed?
Validation whether `execution.batch-shuffle-mode` is `ALL_EXCHANGES_BLOCKING` is supported in #2106. `Flink Configuration` of `Deploy Flink client` should also add this configuration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2355 from SteNicholas/CELEBORN-1134.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Spark2.4 with Scala2.12 in `sbt.md`. Meanwhile, the CI workflow adds the test for Spark2.4 and Scala2.12.
Follow up #2344.
### Why are the changes needed?
Spark2.4 with Scala2.12 is supported.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2345 from SteNicholas/CELEBORN-1298.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
This adds an internal port and auth support to Celeborn Wokers.
1. Internal port is used by a worker to receive messages from Celeborn Master.
2. Authentication support for secure communication with clients. This change doesn't add the support in clients to communicate to the Workers securely. That will be in a future change.
This change targets just adding the port and auth support to Worker. The following items from the proposal are still pending:
- Persisting the app secrets in Ratis.
- Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
- Secured communication between workers and clients.
### Why are the changes needed?
It is needed for adding authentication support to Celeborn ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011))
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Part of a bigger change. For this change, only modified existing UTs.
Closes#2292 from otterc/CELEBORN-1256.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Add document about QuotaManager based on ConfigService
### Why are the changes needed?
Add document about QuotaManager based on ConfigService
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closes#2325 from AngersZhuuuu/CELEBORN-1284.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Add trace mark symbol.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2338 from FMX/B1295.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `configuration.md` to document dynamic config and config service.
### Why are the changes needed?
`DynamicConfig` and `ConfigService` have already been supported in #2100, which should be documented to introduce the feature.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2336 from SteNicholas/CELEBORN-1286.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Deprecate `celeborn.quota.configuration.path` config. User `celeborn.dynamicConfig.store.fs.path` instead.
### Why are the changes needed?
`DefaultQuotaManager` is removed in #2298, which causes that `celeborn.quota.configuration.path` is useless. `celeborn.quota.configuration.path` could be deprecated that uses `celeborn.dynamicConfig.store.fs.path` to config quota.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2339 from SteNicholas/CELEBORN-1239.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Introduce `celeborn.dynamicConfig.store.fs.path` config to configure the path of dynamic config file for fs store backend.
### Why are the changes needed?
`FsConfigServiceImpl` uses `celeborn.quota.configuration.path` to configure the path of dynamic config file for fs store backend at present. The path of dynamic config file should be introduced with `celeborn.dynamicConfig.store.fs.path` instead of quota configuration path.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2337 from SteNicholas/CELEBORN-1296.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Optimize the handling of exceptions during the push of replica data, now only throwing PUSH_DATA_CONNECTION_EXCEPTION_REPLICA in specific scenarios.
### Why are the changes needed?
When handling exceptions related to pushing replica data in the worker, unmatched exceptions, such as 'file already closed,' are uniformly transformed into REPLICATE_DATA_CONNECTION_EXCEPTION_COUNT and returned to the client. The client then excludes the peer node based on this count, which may not be appropriate in certain scenarios. For instance, in the case of an exception like 'file already closed,' it typically occurs during multiple splits and commitFile operations. Excluding a large number of nodes under such circumstances is clearly not in line with expectations.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
through exist uts
Closes#2323 from lyy-pineapple/CELEBORN-1282.
Authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Add `celeborn.quota.enabled` at Master and Client side to enable checking quota
### Why are the changes needed?
`celeborn.quota.enabled` should be added in Master and Client side to enable quota check for Celeborn Master and Client.
### Does this PR introduce _any_ user-facing change?
Add categories of `celeborn.quota,enabled` with `master` and `client`.
### How was this patch tested?
No.
Closes#2318 from AngersZhuuuu/CELEBORN-1277.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
https://github.com/apache/incubator-celeborn/pull/2292#discussion_r1497160753
Based on the above discussion, removing the additional secured port. The existing port will be used for secured communication when auth is enabled.
### Why are the changes needed?
These changes are for enabling authentication
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
This removed additional secured port.
Closes#2327 from otterc/CELEBORN-1257.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Change the default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` from `LEVELDB` to `ROCKSDB`.
### Why are the changes needed?
Because the LevelDB support will be removed, the default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` could be changed to ROCKSDB instead of LEVELDB for preparation of LevelDB deprecation.
Backport:
[[SPARK-45351][CORE] Change spark.shuffle.service.db.backend default value to ROCKSDB](https://github.com/apache/spark/pull/43142)
[[SPARK-45413][CORE] Add warning for prepare drop LevelDB support](https://github.com/apache/spark/pull/43217)
### Does this PR introduce _any_ user-facing change?
The default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` is changed from `LEVELDB` to `ROCKSDB`.
### How was this patch tested?
No.
Closes#2320 from SteNicholas/CELEBORN-1280.
Lead-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: Nicholas Jiang <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce Rest API of listing dynamic configuration `/listDynamicConfigs` to list the dynamic configs. The result of `/listDynamicConfigs` is as follows:
```
=========================== Dynamic Configuration ============================
celeborn.master.ha.ratis.raft.server.snapshot.auto.trigger.threshold 100000
celeborn.worker.flusher.buffer.size 64k
=========================== SYSTEM ============================
celeborn.master.ha.ratis.raft.server.snapshot.auto.trigger.threshold 200000
celeborn.worker.flusher.buffer.size 128k
=========================== TENANT ============================
=========================== Tenant: tenantId1 ============================
celeborn.master.ha.ratis.raft.server.snapshot.auto.trigger.threshold 300000
celeborn.worker.flusher.buffer.size 256k
=========================== TENANT_USER ============================
=========================== Tenant: tenantId1, Name: user1 ============================
celeborn.master.ha.ratis.raft.server.snapshot.auto.trigger.threshold 400000
celeborn.worker.flusher.buffer.size 512k
```
### Why are the changes needed?
Celeborn supports dynamic configuration with `ConfigService` at present. It's recommend to introduce Rest API of dynamic configuration management.
### Does this PR introduce _any_ user-facing change?
- Introduce Rest API of listing dynamic configuration: `/listDynamicConfigs?level=[system|tenant|tenant_user]&tenant=tenantId1&name=user1`.
### How was this patch tested?
- `HttpUtilsSuite#CELEBORN-1056: Introduce Rest API of listing dynamic configuration`
Closes#2311 from SteNicholas/CELEBORN-1056.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
This pr does 2 things:
1. Remove unnecessary conf QUOTA_MANAGER since we implement it with ConfigService and ConfigService already have a conf to indicate the implement method.
2. Move the quota manager to Master side since only master use this
3. Support quota manager use FsConfigService and support default system level
### Why are the changes needed?
1. Many times, for users who do not have a quota configured, we hope to have a default quota that applies to them.
2. Quota manager should support refresh
3. QuotaManager should support integrate with ConfigService
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
added ut
Closes#2298 from AngersZhuuuu/CELEBORN-1239.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Rename `celeborn.worker.sortPartition.reservedMemory.enabled` to `celeborn.worker.sortPartition.prefetch.enabled`. Address [r1469066327](https://github.com/apache/incubator-celeborn/pull/2264/files#r1469066327) of pan3793.
### Why are the changes needed?
`celeborn.worker.sortPartition.reservedMemory.enabled` is misleading, which should represent that prefetch the original partition files during the first sequential reading path to leverage the Linux PageCache mechanism to speed up the subsequent random reading of them. The config name could use `celeborn.worker.sortPartition.prefetch.enabled` which is is more accurate.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2312 from SteNicholas/CELEBORN-1254.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Since we support ConfigService, many configuration can be dynamic, add `isDynamic` property for CelebornConf in this pr.
### Why are the changes needed?
Make configuration doc more cleear
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existed UT
Closes#2308 from AngersZhuuuu/CELEBORN-1051.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Improve the implementation of `ConfigService` including:
- Removes `celeborn.dynamicConfig.enabled`.
- Changes `celeborn.dynamicConfig.store.backend` to optional.
- Renames `refreshAllCache` to `refreshCache` in `ConfigService`.
- Checks whether the dynamic config file exists and is file in `FsConfigServiceImpl`.
### Why are the changes needed?
Whether to enable dynamic config could check via whether `celeborn.dynamicConfig.store.backend` is provided, instead of `celeborn.dynamicConfig.enabled`. The `refreshAllCache` interface could rename to `refreshCache` and throw Exception simply. Meanwhile, `FsConfigServiceImpl` should check whether the dynamic config file exists and is file.
### Does this PR introduce _any_ user-facing change?
- Renames `refreshAllCache` to `refreshCache` in `ConfigService`.
### How was this patch tested?
- `ConfigServiceSuiteJ`
Closes#2304 from SteNicholas/CELEBORN-1052.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
For some scenarios, if Celeborn cannot be used, users want to report an error directly instead of fallback.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI
Closes#2291 from kerwin-zk/add-config.
Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager.
This is part of adding authentication support in Celeborn (see CELEBORN-1011).
This change targets just adding the secured port to Master. The following items from the proposal are still pending:
1. Persisting the app secrets in Ratis.
2. Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
3. Secured and internal port in Workers.
4. Secured communication between workers and clients.
In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261
### Why are the changes needed?
It is needed for adding authentication support to Celeborn (CELEBORN-1011)
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Added a simple UT.
Closes#2281 from otterc/CELEBORN-1257.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Support database based store backend implementation for dynamic configuration management
### Why are the changes needed?
Currently celeborn provides `FsConfigServiceImpl` implementation for dynamic config service which is based on file system, We cloud Support database based store backend implementation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- `ConfigServiceSuiteJ#testDbConfig`
Closes#2273 from RexXiong/CELEBORN-1054.
Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve `Spark Configuration` of `Deploy Spark client` in `deploy.md`.
Fix#2270.
### Why are the changes needed?
It's recommended to improve the Spark Configuration of Deploy Spark client for deployment document with Spark Dynamic Resource Allocation support.
```
# Support Spark Dynamic Resource Allocation
# Required Spark version >= 3.5.0
spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO
# Required Spark version >= 3.4.0, highly recommended to disable
spark.dynamicAllocation.shuffleTracking.enabled false
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2278 from SteNicholas/CELEBORN-1260.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce application dimension resource consumption metrics for `ResourceConsumptionSource`.
### Why are the changes needed?
`ResourceConsumption` namespace metrics are generated for each user and they are identified using a metric tag at present. It's recommended to introduce application dimension resource consumption metrics that expose application dimension resource consumption of Master and Worker. By monitoring resource consumption in the application dimension, you can obtain the actual situation of application resource consumption.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- `WorkerInfoSuite#WorkerInfo toString output`
- `PbSerDeUtilsTest#fromAndToPbResourceConsumption`
- `MasterStateMachineSuitej#testObjSerde`
Closes#2161 from SteNicholas/CELEBORN-1174.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
1. Support Celeborn Master(Leader) to manage workers by sending event when heartbeat
2. Add Worker Status to Worker then we can know the status of the workers(such as during decommission...)
3. Add Http interface for master to handleWorkerEvent/getWorkerEvent
### Why are the changes needed?
Currently, we only support managing the status of workers on the worker side. This pr supports the master to manage the status of all workers. By sending events such as (Decommission/Graceful/Exit) when heartbeat, workers can be asynchronously execute the command from master. MeanWhile we can't know what the worker status during worker decommission so this pr add worker status to tell the exactly status of the worker.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#2255 from RexXiong/CELEBORN-1245.
Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce hot load for CelebornRackResolver.
### Why are the changes needed?
In production environment, we often expand the machine, so the rack configuration also needs to be updated in time.
### Does this PR introduce _any_ user-facing change?
master.md
### How was this patch tested?
UTs.
Closes#2246 from leixm/issue_1241.
Lead-authored-by: Xianming Lei <jerrylei@apache.org>
Co-authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
With authentication ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)), the handling of messages from Client by Celeborn services (Master/workers) will go through a SASL handshake. However, messages exchanged between Masters and Workers will not.
With a single netty server on the Master/Workers handling both client and master/workers messages, differentiating between the two types of connection is a challenge. It will be better if Master/Workers have a separate designated port for Clients and a separate one for internal components (workers and other Masters).
In this change, we propose
- the config that enables creating dedicated internal ports on Masters/Workers.
- creation of the dedicated internal port in just the Master. A subsequent PR will add that creation of the dedicated internal port in Workers.
### Why are the changes needed?
This change is required for adding authentication. ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)).
### Does this PR introduce _any_ user-facing change?
Yes, there are new configurations added.
### How was this patch tested?
Added a UT
Closes#2265 from otterc/CELEBORN-1012.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `celeborn.worker.sortPartition.reservedMemory.enabled` to support that `PartitionFilesSorter` seeks to position of each block and does not warmup for non-hdfs files.
### Why are the changes needed?
File sorting includes three steps: reading files, sorting MapIds, and writing files. The default block of Celeborn is 256k, and the number of blocks is about 1000, so the sorting process is very fast, and the main overhead is file reading and writing. There are roughly three options for the entire sorting process:
1. Memory of the file size is allocated in advance, the file is read in as a whole, MapId is parsed and sorted, and Blocks are written back to the disk in MapId order.
2. No memory is allocated, seek to the location of each block, parse and sort the MapId, and transfer the Blocks of the original file to the new file in the order of MapId.
3. Allocate a small block of memory (such as 256k), read the entire file sequentially, parse and sort the MapId, and transfer the block of the original file to the new file in the order of MapId.
From an IO perspective, at first glance, solution 1 uses sufficient memory and there is no sequential reading and writing; solution 2 has random reading and random writing; solution 3 has sequential writing. Intuitively solution 1 has better performance. Due to the existence of PageCache, when writing a file in solution 3, the original file is likely to be cached in PageCache. `PartitionFilesSorter` support solution3 with PageCache at present, which has better performance especially HDD disk. It's better to support solution2 with switch config that seeks to position of each block and does not warm up for non-hdfs files especially SDD disk.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA and cluster.
Closes#2264 from SteNicholas/CELEBORN-1254.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Resource consumption of worker does not update when update interval of resource consumpution is greater than heartbeat interval.
<img width="1741" alt="截屏2024-01-24 14 49 50" src="https://github.com/apache/incubator-celeborn/assets/46485123/21cfd412-c69e-4955-8bc8-155ee470697d">
This pull request introduces below changes:
1. Avoid master repeat add gauge for same user
2. For worker, user resource consumption can directly get from worker's snapshot, didn't need update interval
### Why are the changes needed?
No.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2260 from AngersZhuuuu/CELEBORN-1252.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Add configs' alternatives to doc.
### Why are the changes needed?
To help users use correct configs.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA.
Closes#2253 from FMX/b1241.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `OpenStreamSuccessCount`, `FetchChunkSuccessCount` and `WriteDataSuccessCount` metric to expose the count of opening stream, fetching chunk and writing data successfully in current worker.
### Why are the changes needed?
The ratio of opening stream, fetching chunk and writing data failed is important for Celeborn performance to balance the healty of cluster, which is lack of the count of opening stream, fetching chunk and writing data successfully.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2252 from AngersZhuuuu/CELEBORN-1246.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
`PushDataHandler` should build replicate factory to get client for sending replicate data instead of push client factory. Meanwhile, timeout checker of `TransportResponseHandler` should run with `replicate` module instead of `push`.
Follow up #2232.
### Why are the changes needed?
`PushDataHandler` uses push client factory to create client for replicating, which should use replicate factory, otherwise replicate module configuration does not take effect for replicating of worker server. Meanwhile, timeout checker of `TransportResponseHandler` runs with `push` module, which does not work well with replicate client for worker.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA and cluster.
Closes#2241 from SteNicholas/CELEBORN-1225.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Refactor metrics name.
### Why are the changes needed?
Easier to understand the meaning of metrics
### Does this PR introduce _any_ user-facing change?
METRICS.md
migration.md
monitoring.md
### How was this patch tested?
Existing UTs.
Closes#2240 from leixm/metrics_name.
Authored-by: xianminglei <xianming.lei@shopee.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
`PushDataHandler` should build replicate factory to get client for sending replicate data instead of push client factory.
### Why are the changes needed?
`PushDataHandler` uses push client factory to create client for replicating, which should use replicate factory, otherwise replicate module configuration does not take effect for replicating of worker server.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA and cluster.
Closes#2232 from SteNicholas/CELEBORN-1225.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Align master and worker metrics of document with `MasterSource` and `WorkerSource` in `METRICS.md` and `monitoring.md`.
### Why are the changes needed?
Metrics of master and worker is inconsistent with `MasterSource` and `WorkerSource` at present. It is recommended to align master and worker metrics of document with `MasterSource` and `WorkerSource`:
- PushDataHandshakeFailCount
- RegionStartFailCount
- RegionFinishFailCount
- PrimaryPushDataHandshakeTime
- ReplicaPushDataHandshakeTime
- PrimaryRegionStartTime
- ReplicaRegionStartTime
- PrimaryRegionFinishTime
- ReplicaRegionFinishTime
- ActiveConnectionCount
- BufferStreamReadBuffer
- ReadBufferDispatcherRequestsLength
- ReadBufferAllocatedCount
- CreditStreamCount
- ActiveMapPartitionCount
- DeviceOSFreeBytes
- DeviceOSTotalBytes
- DeviceCelebornFreeBytes
- DeviceCelebornTotalBytes
- PotentialConsumeSpeed
- UserProduceSpeed
- WorkerConsumeSpeed
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2226 from SteNicholas/CELEBORN-1223.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `PausePushDataAndReplicateTime` metric to record time for a worker to stop receiving pushData from clients and other workers.
### Why are the changes needed?
`PausePushData` means the count for a worker to stop receiving pushData from clients because of back pressure. Meanwhile, `PausePushDataAndReplicate` means the count for a worker to stop receiving pushData from clients and other workers because of back pressure. Therefore,`PausePushDataTime` records the time for a worker to stop receiving pushData from clients or other workers, of which definition is confusing for users. It's recommended that `PausePushDataAndReplicateTime` metric is introduced that means the time for a worker to stop receiving pushData from clients and other workers because of back pressure.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- [Celeborn Dashboard](https://stenicholas.grafana.net/d/U_qgru_7z/celeborn?orgId=1&refresh=5s)
- `MemoryManagerSuite#[CELEBORN-882] Test MemoryManager check memory thread logic`
Closes#2221 from SteNicholas/CELEBORN-1215.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `WriteDataHardSplitCount` metric to record `HARD_SPLIT` partitions of PushData and PushMergedData.
### Why are the changes needed?
As the log level of `PushDataHandler#handlePushData` and `PushDataHandler#handlePushMergedData` use the DEBUG level, `WriteDataHardSplitCount` metric shoud be introduced to record HARD_SPLIT partitions of PushData and PushMergedData for `PushDataHandler`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
[Celeborn Dashboard](https://stenicholas.grafana.net/d/U_qgru_7z/celeborn?orgId=1&refresh=5s)
Closes#2217 from SteNicholas/CELEBORN-1214.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
- Fix some typos.
### Why are the changes needed?
- Ditto.
### Does this PR introduce _any_ user-facing change?
- No.
### How was this patch tested?
- No need.
Closes#2214 from Radeity/fix-typo.
Authored-by: Aaron Wang <wangweirao16@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduces `ChunkStreamCount`, `OpenStreamFailCount` metrics about opening stream of `FetchHandler`:
- `WorkerSource` adds `ChunkStreamCount`, `OpenStreamFailCount` metrics.
- Corrects the grafana dashboard of `celeborn-dashboard.json`. `celeborn-dashboard.json` has been verified via [Celeborn Dashboard](https://stenicholas.grafana.net/d/U_qgru_7z/celeborn?orgId=1&refresh=5s). For example:
1. `"expr": "metrics_RunningApplicationCount_Value"`
2. Moves the panel positition of `FetchChunkFailCount` to `FetchRelatives` instead of `PushRelatives`.
3. Updates the `gridPos` of some panels.
### Why are the changes needed?
There are no any metrics about opening stream of `FetchHandler` for Celeborn Worker.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
[Celeborn Dashboard](https://stenicholas.grafana.net/d/U_qgru_7z/celeborn?orgId=1&refresh=5s)
Closes#2212 from SteNicholas/CELEBORN-1100.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Revert "[CELEBORN-1150] support io encryption for spark".
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2208 from FMX/b1150-3.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Add a cache in partition sorted and limit its max size.
### Why are the changes needed?
To reduce memory consumption in partition sort by tweak the index cache.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster.
Closes#2194 from FMX/B1201.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>