### What changes were proposed in this pull request?
Add `celeborn.quota.enabled` at Master and Client side to enable checking quota
### Why are the changes needed?
`celeborn.quota.enabled` should be added in Master and Client side to enable quota check for Celeborn Master and Client.
### Does this PR introduce _any_ user-facing change?
Add categories of `celeborn.quota,enabled` with `master` and `client`.
### How was this patch tested?
No.
Closes#2318 from AngersZhuuuu/CELEBORN-1277.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
https://github.com/apache/incubator-celeborn/pull/2292#discussion_r1497160753
Based on the above discussion, removing the additional secured port. The existing port will be used for secured communication when auth is enabled.
### Why are the changes needed?
These changes are for enabling authentication
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
This removed additional secured port.
Closes#2327 from otterc/CELEBORN-1257.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: waitinfuture <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Change the default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` from `LEVELDB` to `ROCKSDB`.
### Why are the changes needed?
Because the LevelDB support will be removed, the default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` could be changed to ROCKSDB instead of LEVELDB for preparation of LevelDB deprecation.
Backport:
[[SPARK-45351][CORE] Change spark.shuffle.service.db.backend default value to ROCKSDB](https://github.com/apache/spark/pull/43142)
[[SPARK-45413][CORE] Add warning for prepare drop LevelDB support](https://github.com/apache/spark/pull/43217)
### Does this PR introduce _any_ user-facing change?
The default value of `celeborn.worker.graceful.shutdown.recoverDbBackend` is changed from `LEVELDB` to `ROCKSDB`.
### How was this patch tested?
No.
Closes#2320 from SteNicholas/CELEBORN-1280.
Lead-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: Nicholas Jiang <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
This pr does 2 things:
1. Remove unnecessary conf QUOTA_MANAGER since we implement it with ConfigService and ConfigService already have a conf to indicate the implement method.
2. Move the quota manager to Master side since only master use this
3. Support quota manager use FsConfigService and support default system level
### Why are the changes needed?
1. Many times, for users who do not have a quota configured, we hope to have a default quota that applies to them.
2. Quota manager should support refresh
3. QuotaManager should support integrate with ConfigService
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
added ut
Closes#2298 from AngersZhuuuu/CELEBORN-1239.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Rename `celeborn.worker.sortPartition.reservedMemory.enabled` to `celeborn.worker.sortPartition.prefetch.enabled`. Address [r1469066327](https://github.com/apache/incubator-celeborn/pull/2264/files#r1469066327) of pan3793.
### Why are the changes needed?
`celeborn.worker.sortPartition.reservedMemory.enabled` is misleading, which should represent that prefetch the original partition files during the first sequential reading path to leverage the Linux PageCache mechanism to speed up the subsequent random reading of them. The config name could use `celeborn.worker.sortPartition.prefetch.enabled` which is is more accurate.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2312 from SteNicholas/CELEBORN-1254.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Since we support ConfigService, many configuration can be dynamic, add `isDynamic` property for CelebornConf in this pr.
### Why are the changes needed?
Make configuration doc more cleear
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existed UT
Closes#2308 from AngersZhuuuu/CELEBORN-1051.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Improve the implementation of `ConfigService` including:
- Removes `celeborn.dynamicConfig.enabled`.
- Changes `celeborn.dynamicConfig.store.backend` to optional.
- Renames `refreshAllCache` to `refreshCache` in `ConfigService`.
- Checks whether the dynamic config file exists and is file in `FsConfigServiceImpl`.
### Why are the changes needed?
Whether to enable dynamic config could check via whether `celeborn.dynamicConfig.store.backend` is provided, instead of `celeborn.dynamicConfig.enabled`. The `refreshAllCache` interface could rename to `refreshCache` and throw Exception simply. Meanwhile, `FsConfigServiceImpl` should check whether the dynamic config file exists and is file.
### Does this PR introduce _any_ user-facing change?
- Renames `refreshAllCache` to `refreshCache` in `ConfigService`.
### How was this patch tested?
- `ConfigServiceSuiteJ`
Closes#2304 from SteNicholas/CELEBORN-1052.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
For some scenarios, if Celeborn cannot be used, users want to report an error directly instead of fallback.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI
Closes#2291 from kerwin-zk/add-config.
Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager.
This is part of adding authentication support in Celeborn (see CELEBORN-1011).
This change targets just adding the secured port to Master. The following items from the proposal are still pending:
1. Persisting the app secrets in Ratis.
2. Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
3. Secured and internal port in Workers.
4. Secured communication between workers and clients.
In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261
### Why are the changes needed?
It is needed for adding authentication support to Celeborn (CELEBORN-1011)
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Added a simple UT.
Closes#2281 from otterc/CELEBORN-1257.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Support database based store backend implementation for dynamic configuration management
### Why are the changes needed?
Currently celeborn provides `FsConfigServiceImpl` implementation for dynamic config service which is based on file system, We cloud Support database based store backend implementation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- `ConfigServiceSuiteJ#testDbConfig`
Closes#2273 from RexXiong/CELEBORN-1054.
Authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce hot load for CelebornRackResolver.
### Why are the changes needed?
In production environment, we often expand the machine, so the rack configuration also needs to be updated in time.
### Does this PR introduce _any_ user-facing change?
master.md
### How was this patch tested?
UTs.
Closes#2246 from leixm/issue_1241.
Lead-authored-by: Xianming Lei <jerrylei@apache.org>
Co-authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
With authentication ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)), the handling of messages from Client by Celeborn services (Master/workers) will go through a SASL handshake. However, messages exchanged between Masters and Workers will not.
With a single netty server on the Master/Workers handling both client and master/workers messages, differentiating between the two types of connection is a challenge. It will be better if Master/Workers have a separate designated port for Clients and a separate one for internal components (workers and other Masters).
In this change, we propose
- the config that enables creating dedicated internal ports on Masters/Workers.
- creation of the dedicated internal port in just the Master. A subsequent PR will add that creation of the dedicated internal port in Workers.
### Why are the changes needed?
This change is required for adding authentication. ([CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011)).
### Does this PR introduce _any_ user-facing change?
Yes, there are new configurations added.
### How was this patch tested?
Added a UT
Closes#2265 from otterc/CELEBORN-1012.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `celeborn.worker.sortPartition.reservedMemory.enabled` to support that `PartitionFilesSorter` seeks to position of each block and does not warmup for non-hdfs files.
### Why are the changes needed?
File sorting includes three steps: reading files, sorting MapIds, and writing files. The default block of Celeborn is 256k, and the number of blocks is about 1000, so the sorting process is very fast, and the main overhead is file reading and writing. There are roughly three options for the entire sorting process:
1. Memory of the file size is allocated in advance, the file is read in as a whole, MapId is parsed and sorted, and Blocks are written back to the disk in MapId order.
2. No memory is allocated, seek to the location of each block, parse and sort the MapId, and transfer the Blocks of the original file to the new file in the order of MapId.
3. Allocate a small block of memory (such as 256k), read the entire file sequentially, parse and sort the MapId, and transfer the block of the original file to the new file in the order of MapId.
From an IO perspective, at first glance, solution 1 uses sufficient memory and there is no sequential reading and writing; solution 2 has random reading and random writing; solution 3 has sequential writing. Intuitively solution 1 has better performance. Due to the existence of PageCache, when writing a file in solution 3, the original file is likely to be cached in PageCache. `PartitionFilesSorter` support solution3 with PageCache at present, which has better performance especially HDD disk. It's better to support solution2 with switch config that seeks to position of each block and does not warm up for non-hdfs files especially SDD disk.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA and cluster.
Closes#2264 from SteNicholas/CELEBORN-1254.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Resource consumption of worker does not update when update interval of resource consumpution is greater than heartbeat interval.
<img width="1741" alt="截屏2024-01-24 14 49 50" src="https://github.com/apache/incubator-celeborn/assets/46485123/21cfd412-c69e-4955-8bc8-155ee470697d">
This pull request introduces below changes:
1. Avoid master repeat add gauge for same user
2. For worker, user resource consumption can directly get from worker's snapshot, didn't need update interval
### Why are the changes needed?
No.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2260 from AngersZhuuuu/CELEBORN-1252.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Add configs' alternatives to doc.
### Why are the changes needed?
To help users use correct configs.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA.
Closes#2253 from FMX/b1241.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
`PushDataHandler` should build replicate factory to get client for sending replicate data instead of push client factory. Meanwhile, timeout checker of `TransportResponseHandler` should run with `replicate` module instead of `push`.
Follow up #2232.
### Why are the changes needed?
`PushDataHandler` uses push client factory to create client for replicating, which should use replicate factory, otherwise replicate module configuration does not take effect for replicating of worker server. Meanwhile, timeout checker of `TransportResponseHandler` runs with `push` module, which does not work well with replicate client for worker.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA and cluster.
Closes#2241 from SteNicholas/CELEBORN-1225.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
`PushDataHandler` should build replicate factory to get client for sending replicate data instead of push client factory.
### Why are the changes needed?
`PushDataHandler` uses push client factory to create client for replicating, which should use replicate factory, otherwise replicate module configuration does not take effect for replicating of worker server.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA and cluster.
Closes#2232 from SteNicholas/CELEBORN-1225.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
- Fix some typos.
### Why are the changes needed?
- Ditto.
### Does this PR introduce _any_ user-facing change?
- No.
### How was this patch tested?
- No need.
Closes#2214 from Radeity/fix-typo.
Authored-by: Aaron Wang <wangweirao16@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Revert "[CELEBORN-1150] support io encryption for spark".
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2208 from FMX/b1150-3.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Add a cache in partition sorted and limit its max size.
### Why are the changes needed?
To reduce memory consumption in partition sort by tweak the index cache.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster.
Closes#2194 from FMX/B1201.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove pipeline feature for sort based writer
### Why are the changes needed?
The pipeline feature is added as part of CELEBORN-295, for performance. Eventually, an unresolvable issue that would crash the JVM was identified in https://github.com/apache/incubator-celeborn/pull/1807, and after discussion, we decided to delete this feature.
### Does this PR introduce _any_ user-facing change?
No, the pipeline feature is disabled by default, there are no changes to users who use the default settings.
### How was this patch tested?
Pass GA.
Closes#2196 from pan3793/CELEBORN-891.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Dynamically determine the writing mode in Spark based on the number of partitions.
### Why are the changes needed?
Enhance the flexibility of shuffle writes to improve performance.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Add uts
Closes#2160 from lyy-pineapple/dynamic-write-mode.
Lead-authored-by: liangyongyuan <liangyongyuan@xiaomi.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
1. To support io encryption for spark.
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and manually test on a cluster.
Closes#2135 from FMX/B1150.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Changes the version of the config to 0.5 given that 0.4 will be released soon.
### Why are the changes needed?
See above.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
NA
Closes#2165 from otterc/CELEBORN-1180.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
One user reported that LifecycleManager's parmap can create huge number of threads and causes OOM.

There are four places where parmap is called:
1. When LifecycleManager commits files
2. When LifecycleManager reserves slots
3. When LifecycleManager setup connection to workers
4. When LifecycleManager call destroy slots
This PR fixes the fourth one. To be more detail, this PR eliminates `parmap` when destroying slots, and also replaces `askSync` with `ask`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test and GA.
Closes#2156 from waitinfuture/1167.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: cxzl25 <cxzl25@users.noreply.github.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
This adds the client side Sasl authentication support in the transport layer. Most of this code is taken from Apache Spark.
### Why are the changes needed?
The changes are needed for adding authentication to Celeborn. See [CELEBORN-1011](https://issues.apache.org/jira/browse/CELEBORN-1011).
### Does this PR introduce _any_ user-facing change?
Added a configuration for Sasl request timeout
### How was this patch tested?
Will be adding `CelebornSaslSuiteJ.java` (https://github.com/apache/incubator-celeborn/pull/2105) that tests the end-to-end Sasl flow.
Closes#2139 from otterc/CELEBORN-1157.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
Since we are backporting #2145 to branch-0.3, and the configuration entry `celeborn.client.rpc.shared.threads` in #2145
has a start version of 0.4.0, this update aligns the version accordingly.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#2153 from cfmcgrady/celeborn-1160-followup.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
One user reported that LifecycleManager's parmap can create huge number of threads and causes OOM.

There are four places where parmap is called:
1. When LifecycleManager commits files
2. When LifecycleManager reserves slots
3. When LifecycleManager setup connection to workers
4. When StorageManager calls close
This PR fixes the first one. To be more detail, this PR eliminates `parmap` when doing committing files, and also replaces `askSync` with `ask`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test and GA.
Closes#2145 from waitinfuture/1160.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
When request slots, filter workers excluded by application
### Why are the changes needed?
If worker alive but can not service, register shuffle will remove the worker from application client exclude list and next shuffle may reserve slots on this worker,this will cause application revive unexpectly
### Does this PR introduce _any_ user-facing change?
Yes, request slots will filter workers excluded by application
### How was this patch tested?
UT,
Closes#2131 from wangshengjie123/fix-request-slots-blacklist.
Authored-by: wangshengjie <wangshengjie3@xiaomi.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
If the user does not use prometheus to collect monitoring metrics, but rather some other ones. Using metrics in JSON format would be more user-friendly.The PR supports JSON format for metrics.
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
Metrics supports JSON format
### How was this patch tested?
Cluster test.
Closes#2089 from suizhe007/CELEBORN-1122.
Authored-by: qinrui <qr7972@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
Follow up #2100. Mainly changes the package from scala to java of the codes in #2100. Meanwhile, `FsConfigServiceImpl#refresh` should directly return instead of refreshing configs.
### Why are the changes needed?
This PR follow up dynamic `ConfigService` at `SystemLevel` and `TenantLevel`, Dynamic configuration is a type of configuration that can be changed at runtime as needed in #2100. The implementation of `ConfigService` is based on Java codes, which are put into Scala package and cause that the spotless plugin does not format well. After the changes of the pull request, there are much code style changes generated from the package moving behavior.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`ConfigServiceSuiteJ`.
Closes#2125 from SteNicholas/CELEBORN-1052.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
The `clientPushBufferMaxSize` config is also used by `CelebornInputStreamImpl`, it's a config about push side and should not be used by fetch side. This pr introduces a fetch config to replace it.
### Why are the changes needed?
As above
### Does this PR introduce _any_ user-facing change?
Yes, a new config `celeborn.client.fetch.buffer.size` is introduced.
### How was this patch tested?
Pass CI
Closes#2118 from exmy/celeborn-1145.
Authored-by: exmy <xumovens@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Introduce JVM monitoring in Celeborn Worker using JVMQuake to enable early detection of memory management issues and facilitate fast failure.
### Why are the changes needed?
When facing out-of-control memory management in Celeborn Worker we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered. Therefore JVMQuake is introduced to provide more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure. Refers to the principle of [jvmquake](https://github.com/Netflix-Skunkworks/jvmquake) which is a JVMTI agent that attaches to your JVM and automatically signals and kills it when the program has become unstable.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`JVMQuakeSuite`
Closes#2061 from SteNicholas/CELEBORN-1092.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
This PR introduce dynamic ConfigService at SystemLevel and TenantLevel, Dynamic configuration is a type of configuration that can be changed at runtime as needed. It can be used at system level/tenant level. When applying dynamic configuration, the priority order is as follows: tenant level overrides system level, which in turn overrides static configuration(CelebornConf). This means that if a configuration is defined at the tenant level, it will be used instead of the system level or static configuration(CelebornConf). If the tenant-level configuration is missing,
the system-level configuration will be used. If the system-level configuration is also missing, CelebornConf
will be used as the default value.
There are several other tasks related to this feature that will be implemented in the future.
- [ ] [Add isDynamic property for CelebornConf](https://issues.apache.org/jira/browse/CELEBORN-1051)
- [ ] [Support DB based Configserver](https://issues.apache.org/jira/browse/CELEBORN-1054)
- [ ] [Add restAPI for configuration management](https://issues.apache.org/jira/browse/CELEBORN-1056)
### Why are the changes needed?
The current configuration of the server (CelebornConf) is static. When the configuration is changed, the service needs to be restarted. This PR introduces a dynamic configuration solution. The server side can use dynamic configuration as needed. At the same time, it is considered that the tenant level will be supported in the future (such as supporting tenant level dynamic quota control) configuration, so this time we will also consider supporting dynamic tenant-level configuration, and this PR will provide a default implementation based on the file system.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
Closes#2100 from RexXiong/CELEBORN-1052.
Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Currently, Celeborn uses replication to handle shuffle data lost for celeborn shuffle reader, this PR implements an alternative solution by Spark stage resubmission.
Design doc:
https://docs.google.com/document/d/1dkG6fww3g99VAb1wkphNlUES_MpngVPNg8601chmVp8/edit
### Why are the changes needed?
Spark stage resubmission uses less resources compared with replication, and some Celeborn users are also asking for it
### Does this PR introduce _any_ user-facing change?
a new config celeborn.client.fetch.throwsFetchFailure is introduced to enable this feature
### How was this patch tested?
two UTs are attached, and we also tested it in Ant Group's Dev spark cluster
Closes#1924 from ErikFang/Re-run-Spark-Stage-for-Celeborn-Shuffle-Fetch-Failure.
Lead-authored-by: Erik.fang <fmerik@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Make Celeborn read configs from HADOOP_COND_DIR.
2. Remove unnecessary Kerberos configs.
### Why are the changes needed?
To support HDFS with Kerberos.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster.
Closes#2082 from FMX/B1116.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Unregister application to Celeborn master After Application stopped, then master will expire the related shuffle resource immediately, resulting in resource savings.
### Why are the changes needed?
Currently Celeborn master expires the related application shuffle resource only when application is being checked timeout,
this would greatly delay the release of resources, which is not conducive to saving resources.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
PASS GA
Closes#2075 from RexXiong/CELEBORN-1112.
Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Support `celeborn.worker.storage.disk.reserve.ratio` to configure worker reserved ratio for each disk.
### Why are the changes needed?
`CelebornConf` supports to configure celeborn worker reserved space for each disk, which space is absolute. `CelebornConf` could support `celeborn.worker.storage.disk.reserve.ratio` to configure worker reserved ratio for each disk. The minimum usable size for each disk should be the max space between the reserved space and the space calculate via reserved ratio.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`SlotsAllocatorSuiteJ`
Closes#2071 from SteNicholas/CELEBORN-1110.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding Kerberos support for HDFS storage type.
The following five parameters need to be configured:
| key | value |
| :--: | :--: |
| celeborn.storage.hdfs.kerberos.enabled | true |
| celeborn.storage.hdfs.kerberos.principal | userREALM |
| celeborn.storage.hdfs.kerberos.keytab | /path/test.keytab |
| celeborn.hadoop.hadoop.security.authorization | kerberos |
| celeborn.hadoop.dfs.namenode.kerberos.principal | hdfs/_HOSTREALM |
### Why are the changes needed?
Connecting to HDFS with Kerberos enabled requires support for keytab login.
### Does this PR introduce _any_ user-facing change?
Add 3 configurations.
celeborn.storage.hdfs.kerberos.enabled
celeborn.storage.hdfs.kerberos.principal
celeborn.storage.hdfs.kerberos.keytab
### How was this patch tested?
Test in Kerberos enabled HDFS cluster.
Closes#2072 from liujiayi771/hdfs-kerberos.
Authored-by: joey.ljy <joey.ljy@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
1.To support `celeborn.storage.activeTypes` in Client.
2.Master will ignore slots for "UNKNOWN_DISK".
### Why are the changes needed?
Enable client application to select storage types to use.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
GA and cluster.
Closes#2045 from FMX/B1081.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
This change makes the maximum default number of Netty threads configurable. Previously, this value was hardcoded to 64, which could be small for certain environments. While it's possible to configure the number of Netty server and client threads individually for each module, providing an option to increase the default value offers greater convenience.
### Why are the changes needed?
The change offers convenience.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added a UT
Closes#2065 from otterc/CELEBORN-1107.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Support specifying the number of dispatcher threads for each role, especially shuffle client side. For shuffle client, there is only RpcEndpointVerifier endpoint which handles not many requests, one thread is enough. The rpc env of other roles has only two endpoints at most, using a shared event loop is reasonable. I am not sure if there is a need to add rpc requests to shuffle client. So add specific parameters to specify the dispatcher threads here.
And change the dispatcher thread pool name in order to distinguish it from spark's.
### Why are the changes needed?
Ditto
### Does this PR introduce _any_ user-facing change?
Yes, add params celeborn.\<role>.rpc.dispatcher.threads
### How was this patch tested?
Manual test and UT
Closes#2003 from onebox-li/my_dev.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
The `cleaner` of `Worker` executes the `StorageManager#cleanupExpiredShuffleKey` to clean expired shuffle keys with daemon cached thread pool. The optimization speeds up cleaning including expired shuffle keys of ChunkManager to avoid memory leak.
### Why are the changes needed?
`ChunkManager#streams` could lead memory leak when the speed of cleanup is slower than expiration for expired shuffle of worker. The behavior that `ChunkStreamManager` cleanup expired shuffle key should be optimized to avoid memory leak, which causes that the VM thread of worker is 100%.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`WorkerSuite#clean up`.
Closes#2053 from SteNicholas/CELEBORN-1094.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Add the metric `ResourceConsumption` to monitor each user's quota usage of Celeborn Worker.
### Why are the changes needed?
The metric `ResourceConsumption` supports to monitor each user's quota usage of Celeborn Master at present. The usage of Celeborn Worker also needs to monitor.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Internal tests.
Closes#2059 from SteNicholas/CELEBORN-247.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Seperate `overHighWatermark` check to a dedicated thread, let this value can shared better and lighten `CongestionController#isUserCongested` logic.
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test and UT.
Closes#2041 from onebox-li/congest-check.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
`CelebornConf` adds `celeborn.client.shuffle.decompression.lz4.xxhash.instance` to configure fastest available `XXHashFactory` instance for checksum of `Lz4Decompressor`. Fix#2043.
### Why are the changes needed?
`Lz4Decompressor` creates the checksum with `XXHashFactory#fastestInstance`, which returns the fastest available `XXHashFactory` instance that uses nativeInstance at default. The fastest available `XXHashFactory` instance for checksum of `Lz4Decompressor` could be supported to configure instead of dependency on the class loader is the system class loader, which method is as follows:
```
/**
* Returns the fastest available {link XXHashFactory} instance. If the class
* loader is the system class loader and if the
* {link #nativeInstance() native instance} loads successfully, then the
* {link #nativeInstance() native instance} is returned, otherwise the
* {link #fastestJavaInstance() fastest Java instance} is returned.
* <p>
* Please read {link #nativeInstance() javadocs of nativeInstance()} before
* using this method.
*
* return the fastest available {link XXHashFactory} instance.
*/
public static XXHashFactory fastestInstance() {
if (Native.isLoaded()
|| Native.class.getClassLoader() == ClassLoader.getSystemClassLoader()) {
try {
return nativeInstance();
} catch (Throwable t) {
return fastestJavaInstance();
}
} else {
return fastestJavaInstance();
}
}
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- `CelebornConfSuite`
- `ConfigurationSuite`
Closes#2050 from SteNicholas/CELEBORN-1095.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
### What changes were proposed in this pull request?
Add a configuration "celeborn.worker.storage.expireDirs.timeout" with a default value of 6h in rsswork. This configuration is used to set the expiration time for app local directories.
https://issues.apache.org/jira/browse/CELEBORN-1046
### Why are the changes needed?
When Celeborn periodically deletes the directories of apps, it determines whether the app needs to be deleted based on the shuffleKeySet in memory. However, this method may not accurately indicate the completion of the app and could potentially lead to the unintentional deletion of shuffle data.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1998 from wilsonjie/CELEBORN-1046.
Authored-by: sunjunjie <sunjunjie@zto.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Replace
```properties
celeborn.metrics.master.prometheus.host
celeborn.metrics.master.prometheus.port
celeborn.metrics.worker.prometheus.host
celeborn.metrics.worker.prometheus.port
```
With
```properties
celeborn.master.http.host
celeborn.master.http.port
celeborn.worker.http.host
celeborn.worker.http.port
```
### Why are the changes needed?
The `celeborn.master.metrics.prometheus.port` and `celeborn.metrics.worker.prometheus.port` bind port not only serve prometheus metrics, but also provide some useful API services.
https://celeborn.apache.org/docs/latest/monitoring/#rest-api
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1919 from cxzl25/CELEBORN-983.
Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix some typos
### Why are the changes needed?
Ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
-
Closes#1983 from onebox-li/fix-typo.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>