### What changes were proposed in this pull request?
As title, introduce metrics_ShuffleFallbackCount_Value.
### Why are the changes needed?
To provide the insights that how many shuffles fallback to spark built-in shuffle service. It is helpful for us to deprecate the ESS progressively.
Currently, we plan to set the `celeborn.client.spark.shuffle.fallback.numPartitionsThreshold` to fallback the shuffle with too large shuffle partitions number, for example: 50k.
In the future, we plan to limit the acceptable maximum shuffle partition number so that the bad job would be rejected and not impact the celeborn master health.
### Does this PR introduce _any_ user-facing change?
Yes, new metrics.
### How was this patch tested?
UT.
<img width="1188" alt="image" src="https://github.com/user-attachments/assets/8193c12c-5dc9-4783-b64b-6a8449a1bea4">
Closes#2866 from turboFei/record_fallback.
Lead-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
If we use celeborn shuffle service, we can't submit both batch and streaming to the same flink session cluster. This should be highlight in doc.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No need.
Closes#2879 from reswqa/session-doc.
Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Follow up of [https://github.com/apache/celeborn/pull/2835]
Only use dynamic resources when candidates are not enough.
And change the way geting availableWorkers form heartbeat to requestSlots RPC to avoid the burden of heartbeat.
### Why are the changes needed?
No
### Does this PR introduce _any_ user-facing change?
Add another configuration.
### How was this patch tested?
UT
Closes#2852 from zaynt4606/clb1636-flu2.
Authored-by: szt <zaynt4606@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve Celeborn CLI user guide including:
- Add license of Celeborn CLI user guide.
- Optimize the introduction of setup and usage for Celeborn CLI.
- Optimize the navigation of Celeborn CLI to combine Celeborn Ratis Shell.
### Why are the changes needed?
There is no license in Celeborn CLI user guide. Meanwhile, there are certain improvement in user guide including the license, navigation, and the introduction of setup and usage.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2875 from SteNicholas/CELEBORN-1678.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
When users deploy using the release binary as outlined in the documentation, the instructions for copying the client JAR can be unclear.
### Why are the changes needed?
No
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

Closes#2877 from zaynt4606/md.
Authored-by: szt <zaynt4606@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Add Flink hybrid shuffle doc
### Why are the changes needed?
We need the doc for the new hybrid shuffle mode.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
no neeed.
Closes#2867 from reswqa/add-hs-doc.
Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
Currently, only Flink retries establishing a client when a connection problem occurs. This would be beneficial for all other engines to implement as well.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
Closes#2855 from RexXiong/CELEBORN-1673.
Lead-authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Co-authored-by: lvshuang.xjs <lvshuang.xjs@taobao.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
adding user guide to README for cli
### Why are the changes needed?
better user experience when using CLI.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
N/A
Closes#2862 from akpatnam25/CELEBORN-1678.
Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support interrupt shuffle on client side.
I will develop the following functions in order
1. Client supports interrupt shuffle
2. Master supports calculating app-level shuffle usage
### Why are the changes needed?
The current storage quota logic can only limit new shuffles, and cannot limit the writing of existing shuffles. In our production environment, there is such an scenario: the cluster is small, but the user's app single shuffle is large which occupied disk resources, we want to interrupt those shuffle.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unable to test this part independently, Additional tests will be added after completing the second part.
Closes#2801 from leixm/CELEBORN-1577-1.
Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Currently, the ChangePartitionManager retrieves workers from the LifeCycleManager's workerSnapshot. However, during the revival process in reallocateChangePartitionRequestSlotsFromCandidates, it does not account for newly added available workers resulting from elastic contraction and expansion. This PR addresses this issue by updating the candidate workers in the ChangePartitionManager to use the available workers reported in the heartbeat from the master.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
Closes#2835 from zaynt4606/clbdev.
Authored-by: szt <zaynt4606@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Support ratio threshold of unhealthy disks for excluding worker with `celeborn.master.excludeWorker.unhealthyDiskRatioThreshold`.
### Why are the changes needed?
We often encounter issues such as disk input/output errors in production practice. When a bad disk occurs, the worker will be maintained to decommission for repairing the machine disk. The reason is that generally the fault will be repaired in time after it is discovered. It is possible that the machine will not trigger all disk failures if it is out of warranty. It can be replaced directly when it is under warranty. If the disk fails after it is out of warranty, you need to purchase the disk yourself for replacement. At the same time, submitting the disk for repair at one time will affect the failure rate judgment of the system group and scenario. In addition, the occurrence of bad disks will bring about some management problems, such as continuous alarms, and the handling of disk failures is relatively customized.
Therefore, it's recommended to configure ratio threshold of unhealthy disks for excluding worker, which does not need to wait for all unhealthy disks to exclude corresponding worker.
### Does this PR introduce _any_ user-facing change?
Introduce `celeborn.master.excludeWorker.unhealthyDiskRatioThreshold` to configure max ratio of unhealthy disks for excluding worker.
### How was this patch tested?
Cluster test.
Closes#2812 from SteNicholas/CELEBORN-1651.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
To support revising lost shuffle IDs in a long-running job such as flink batch jobs.
### Why are the changes needed?
1. To support revise lost shuffles.
2. To add an HTTP endpoint to revise lost shuffles manually.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Cluster tests.
Closes#2746 from FMX/b1600.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
CongestionController support dynamic config
### Why are the changes needed?
Currently, Celeborn only supports quota management based on disk file bytes/count, and this quota management cannot cope with sudden increases in traffic, which will cause corrupt to the cluster.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
UT.
Closes#2817 from leixm/CELEBORN-1487-2.
Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
`NettyMemoryMetrics` supports `numHeapArenas`, `numDirectArenas`, `tinyCacheSize`, `smallCacheSize`, `normalCacheSize`, `numThreadLocalCaches` and `chunkSize` from `PooledByteBufAllocatorMetric`. Meanwhile, remove `server_` prefix from metric name of netty memory metric in `monitoring.md`.
### Why are the changes needed?
`PooledByteBufAllocatorMetric` provides the following API to support netty memory metrics:
```
public int numHeapArenas() {
return this.allocator.numHeapArenas();
}
public int numDirectArenas() {
return this.allocator.numDirectArenas();
}
public List<PoolArenaMetric> heapArenas() {
return this.allocator.heapArenas();
}
public List<PoolArenaMetric> directArenas() {
return this.allocator.directArenas();
}
public int numThreadLocalCaches() {
return this.allocator.numThreadLocalCaches();
}
public int tinyCacheSize() {
return this.allocator.tinyCacheSize();
}
public int smallCacheSize() {
return this.allocator.smallCacheSize();
}
public int normalCacheSize() {
return this.allocator.normalCacheSize();
}
public int chunkSize() {
return this.allocator.chunkSize();
}
public long usedHeapMemory() {
return this.allocator.usedHeapMemory();
}
public long usedDirectMemory() {
return this.allocator.usedDirectMemory();
}
```
`NettyMemoryMetrics` only supports `usedHeapMemory` and `usedDirectMemory`, which could support `numHeapArenas`, `numDirectArenas`, `tinyCacheSize`, `smallCacheSize`, `normalCacheSize`, `numThreadLocalCaches` and `chunkSize` from `PooledByteBufAllocatorMetric`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
[Celeborn Grafana Dashboard](https://stenicholas.grafana.net/public-dashboards/a520ca36a33843a38bbde28387023f97)
Closes#2802 from SteNicholas/CELEBORN-1640.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding REST api and cli for container info. User can configure this api to be based on whichever cluster manager they are using.
### Why are the changes needed?
see above
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
added UTs
Closes#2758 from akpatnam25/CELEBORN-1599.
Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
We can add randomUUID as an suffix to solve it
### Why are the changes needed?
currently, we cannot guarantee application id is really unique. this may lead to data issue.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
test locally
Closes#2810 from chenkovsky/feature/uuid_appid.
Authored-by: Chongchen Chen <chenkovsky@qq.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Supporting passing tag expression in RequestSlots request. Clients can pass the tags using CelebornConf. Default tag configs for system/tenant/user will be suppoted in follow up PRs.
### Why are the changes needed?
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-11+Supporting+Tags+in+Celeborn
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing UTs passed, will add more UTs while integrating TagsManager with ConfigService.
Closes#2770 from s0nskar/request-slots.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce support control traffic by user/worker traffic speed.
### Why are the changes needed?
Currently, Celeborn only supports quota management based on disk file bytes/count, and this quota management cannot cope with sudden increases in traffic, which will cause corrupt to the cluster.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
UTs.
Closes#2797 from leixm/issue_1487_1.
Authored-by: Xianming Lei <31424839+leixm@users.noreply.github.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Add a config to bypass memory check when sorting shuffle files.
### Why are the changes needed?
If a celeborn worker has quite a large memory and it supports both Spark and Flink engines. This config should be enabled.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Cluster test.
Closes#2798 from FMX/b1637.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Introduce Blaze support document.
### Why are the changes needed?
[Blaze](https://github.com/kwai/blaze) supports Celeborn as remote shuffle service. It's recommened to Blaze support document for introduction of Blaze usage.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2787 from SteNicholas/CELEBORN-1635.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Add navigation for `REST API` document.
### Why are the changes needed?
`REST API` document does not have any navigation, which is better to add navigation to guide REST API.
<img width="1438" alt="image" src="https://github.com/user-attachments/assets/b5b3a14a-38d4-4769-bffb-3acd571d5dbb">
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2775 from SteNicholas/navigate-rest-api.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding a worker metrics for publish unreleased shuffle count when worker was decommissioned.
<img width="885" alt="Screenshot 2024-09-16 at 11 12 33 AM" src="https://github.com/user-attachments/assets/c81f36c1-cbed-44fe-814b-88f3ff29875d">
### Why are the changes needed?
Currently celeborn don't publish the count of unreleased shuffle key which gets lost when a worker is decommissioned. This can be useful for monitoring and configuring the `forceExitTimeout`.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
NA
Closes#2711 from s0nskar/unrelease_shuffle_metric.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
In order to speed up the resource releasing,this PR Unregister shuffle in batch;
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT & Local cluster testing
Closes#2701 from zaynt4606/batchUnregister.
Lead-authored-by: szt <zaynt4606@163.com>
Co-authored-by: Zaynt <shuaizhentao.szt@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Update `advertised endpoint` of master service log in startup document.
### Why are the changes needed?
#2713 has changed the startup log of of master service in `NettyRpcEnv`, which should update the log of startup document.
```
logInfo(s"Starting RPC Server [${config.name}] on ${config.bindAddress}:$actualPort " +
s"with advertised endpoint ${config.advertiseAddress}:$actualPort")
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2773 from SteNicholas/CELEBORN-1513.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Add `emptyFilePrimaryIds` and `emptyFileReplicaIds` of worker service log in startup document.
### Why are the changes needed?
#2300 has added `emptyFilePrimaryIds` and `emptyFileReplicaIds` of startup log of for worker service in `Controller`, which should also add into the log of startup document.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2774 from SteNicholas/CELEBORN-914.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Update name of master service from `MasterSys` to `Master` in startup document to follow up https://github.com/apache/celeborn/pull/2003/files#r1365454256.
### Why are the changes needed?
#2003 has already changed the name of master and worker service, which should also update the name in startup logs of document.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2772 from SteNicholas/CELEBORN-1058.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Update document link of `Get Started With Velox` and `Get Started With ClickHouse` in `glutensupport.md`. Meanwhile, replace `gluten-celeborn-package-xx-SNAPSHOT.jar` with `(The bundled Gluten Jar. Make sure -Pceleborn is specified when it is built.)`, which refers to https://github.com/apache/incubator-gluten/pull/6692.
### Why are the changes needed?
The document link of `Get Started With Velox` and `Get Started With ClickHouse` could not access, which has already changed the url.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2762 from SteNicholas/CELEBORN-1486.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support SSL for celeborn RESTful service.
### Why are the changes needed?
For HTTP SSL connection requirements.
### Does this PR introduce _any_ user-facing change?
No, SSL is disabled by defaults.
### How was this patch tested?
Integration testing.
```
celeborn.master.http.ssl.enabled=true
celeborn.master.http.ssl.keystore.path=/hadoop/keystore.jks
celeborn.master.http.ssl.keystore.password=xxxxxxx
```
<img width="1143" alt="image" src="https://github.com/user-attachments/assets/2334561d-1de3-4b38-bc80-5d5d86d3b8ff">
<img width="695" alt="image" src="https://github.com/user-attachments/assets/e3877468-cc3b-4a4a-bf75-2994f557a104">
Closes#2756 from turboFei/HADP_1609_ssl2.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Impl worker write process for Flink Hybrid Shuffle.
### Why are the changes needed?
We supports tiered producer write data from flink to worker. In this PR, we enable the worker to write this kind of data to storage.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
no need.
Closes#2741 from reswqa/cip6-6-pr.
Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix `RaftPeerId` generated by command of `raftMetaConf` to use real PeerId for `local` command in `celeborn_ratis_shell.md` to sync document [cli.md](https://github.com/apache/ratis/blob/ratis-3.0.1/ratis-docs/src/site/markdown/cli.md).
### Why are the changes needed?
Celeborn has already bumped Ratis version from 3.0.1 to 3.1.0. Ratis v3.1.0 has already fixed RaftPeerId generated by command of "raftMetaConf" to use real PeerId in `raft-meta.conf` and store back to generated `new-raft-meta.conf`.
Backport: https://github.com/apache/ratis/pull/1060
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2748 from SteNicholas/CELEBORN-1466.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix the unique key to reflect correct columns names.
### Why are the changes needed?
Running current DB scripts give below error because `user` column was renamed to `name` (https://github.com/apache/celeborn/pull/2340) but the unique key was not updated correctly.
```
mysql> CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config
-> (
-> id int NOT NULL AUTO_INCREMENT,
-> cluster_id int NOT NULL,
-> tenant_id varchar(255) NOT NULL,
-> level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
-> name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
-> config_key varchar(255) NOT NULL,
-> config_value varchar(255) NOT NULL,
-> type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
-> gmt_create timestamp NOT NULL,
-> gmt_modify timestamp NOT NULL,
-> PRIMARY KEY (id),
-> UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `user`, `config_key`)
-> );
ERROR 1072 (42000): Key column 'user' doesn't exist in table
```
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
Tested in local DB
```
mysql> CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config
-> (
-> id int NOT NULL AUTO_INCREMENT,
-> cluster_id int NOT NULL,
-> tenant_id varchar(255) NOT NULL,
-> level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
-> name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
-> config_key varchar(255) NOT NULL,
-> config_value varchar(255) NOT NULL,
-> type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
-> gmt_create timestamp NOT NULL,
-> gmt_modify timestamp NOT NULL,
-> PRIMARY KEY (id),
-> UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `name`, `config_key`)
-> );
Query OK, 0 rows affected (0.01 sec)
```
Closes#2740 from s0nskar/fix-db-script.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
enrich the docs for supporting wildcard address bind in this [PR](https://github.com/apache/celeborn/pull/2713).
### Why are the changes needed?
better docs
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
N/A - just docs change
Closes#2736 from akpatnam25/CELEBORN-1513-doc-followup.
Authored-by: Aravind Patnam <akpatnam25@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Support wildcard bind for RPC and HTTP servers. When wildcard address is used, the service is able to listen to both ipv4 and ipv6 traffic in dual-stack environments.
The specific scenario where this becomes relevant is as follows:
If some of the compute infrastructure is IPv4 only, some v6 only and others dual stack - the way we can have a single Celeborn infra to cater to all is by:
a) Set bind.preferip to false - so that advertised address is the host and not IP.
b) bind to wild card address
With both in place, the v4 only compute nodes will resolve the v4 address and connect to v4 ip/port.
Likewise, for v6 only.
Dual stack compute nodes will use prefer ipv6 Java flag to resolve to either v4 or v6.
This is how we are handling the combination of scenarios internally.
### Why are the changes needed?
see above.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Tested on a server using netstat, and tried connecting to via `nc -4` and `nc -6` to ensure connection was there.
Closes#2713 from akpatnam25/CELEBORN-1513-fix.
Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Currently metrics have workers and excludedWorkers and other metadata for master service but don't have metadata for available workers. This PR supplemented this part.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Local test

Closes#2723 from zaynt4606/availableWorker.
Authored-by: szt <zaynt4606@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding documentation for missing memory file storage metrics.
### Why are the changes needed?
Few new metrics were added in https://github.com/apache/celeborn/pull/2300 but they were missing their documentation in monitoring.md
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
NA
Closes#2705 from s0nskar/memory_metrics.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Replace the deprecated config `celeborn.storage.activeTypes` with `celeborn.storage.availableTypes` in docs and tests, guiding the new comers to use the new config names.
### Why are the changes needed?
The config `celeborn.storage.activeTypes` has been deprecated in 0.4.0 release.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No feature changed.
Closes#2675 from bowenliang123/avai-types.
Authored-by: Bowen Liang <liangbowen@gf.com.cn>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
update client deployment doc to include the param (spark.celeborn.storage.activeTypes)
### Why are the changes needed?
Just provide a hint for users, otherwise they may miss this param.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Yes
Closes#2683 from zhaohehuhu/dev-0815.
Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `ConfigStore` to support `celeborn.dynamicConfig.store.backend` with short name and backend implementation.
### Why are the changes needed?
`celeborn.dynamicConfig.store.backend` is allowed to be specified in two ways:
- Using short names: Default available options are FS, DB.
- Using the fully qualified class name of the backend implementation.
Therefore, it's recommended to introduce `ConfigStore` based on SPI mechanism for `celeborn.dynamicConfig.store.backend` instead of `dynamicConfigStoreBackendShortNames` which could not add other short name of backend implementation for users.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2698 from SteNicholas/CELEBORN-1550.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Fixing a bug where the `networkLocation` is not persisted in Ratis, and the master defaults to `DEFAULT_RACK` when it loads the snapshot. This was missed in https://github.com/apache/celeborn/pull/2367 unfortunately, and it came up during our stress testing internally.
### Why are the changes needed?
Needed for custom network aware replication, so that networkLocation state is kept in snapshot file.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Updated unit test to ensure serde is correct.
Closes#2669 from akpatnam25/CELEBORN-1549.
Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding support of providing custom dynamic store backend implementation, users can now pass there own implementation for dynamic config store backend.
This change also keep the backwards compatibility of supporting short names for backend like "FS" and "DB"
### Why are the changes needed?
Currently celeborn only supports File and DB based backend while there can be other ways of managing these configs.
### Does this PR introduce _any_ user-facing change?
NO, user facing behaviour will be same.
### How was this patch tested?
Existing UTs verifies that this change is working for "FS" and "DB" implementation.
Closes#2670 from s0nskar/dynamic_config.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
1.20 was the last non-bug-fix release before Flink 2.0, you can found all main upgrade features in this [release note](https://nightlies.apache.org/flink/flink-docs-release-1.20/release-notes/flink-1.20/). I think the most important feature related to Celeborn is we expose some interface to support Flink hybrid shuffle integration with Celeborn([FLIP-459](https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn)). This(supporting hybrid shuffle in Celeborn side) is also a follow-up stuff to this PR.
incompatible changes in 1.20:
- 1.20 use enum `CompressionCodec` instead of `String` to construct `BufferDecompressor` and `BufferCompressor`.
- 1.20 introduce a new method(`notifyPartitionRecoveryStarted`) to `JobShuffleContext` in a non-compatible way.
I've already done the adaptation in this PR.
Closes#2662 from reswqa/support-120.
Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
This pr introduce an optional config item for worker host pattern, and support to check whether the worker host matches the pattern in master end when registering the worker.
If it does not match, the register worker request will be rejected.
### Why are the changes needed?
Currently, the celeborn master allow all the workers to register. It is better to limit the workers allowed to register.
### Does this PR introduce _any_ user-facing change?
No, the config item is optional, no broken change.
### How was this patch tested?
UT.
Closes#2660 from turboFei/hosts_patterns.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
In this pr, it supports to disable the worker unavailable expiration by setting the timeout to -1.
### Why are the changes needed?
In our use case, we want to reserve all the worker unavailable information.
It is acceptable if we use the fixed ports and hosts, and will not occupy much memory resource.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Not needed.
Closes#2657 from turboFei/disable_Cleanup.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Proposing to add a master endpoint resolver, which makes the master endpoints discovery extensible and users can leverage this to create and pass different types of resolver which best fit to their need.
### Why are the changes needed?
Currently celeborn support passing master endpoints by these celeborn configs `celeborn.master.endpoints` and `celeborn.master.internal.endpoints` and the allowed pattern for these configs `<host1>:<port1>[,<host2>:<port2>]*`. Workers and Clients both use above configs to connect with master.
The problem with this approach is that currently it takes static host or IP or domain address which can change over time for a long running worker or client. Ex – Master node going down, domain UUID changed. In our infra this discovery is done by a passing a service group which actively watch the nodes for master service and but there is no way to make it work with celeborn as currently celeborn only works with static addresses.
### Does this PR introduce _any_ user-facing change?
Default behaviour will remain same but user can now pass their own master endpoint resolver.
### How was this patch tested?
Added new UTs
Closes#2629 from s0nskar/masterresolver.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce celeborn-spi module for authentication extensions.
### Why are the changes needed?
Address comments: https://github.com/apache/celeborn/pull/2632#issuecomment-2247132115
### Does this PR introduce _any_ user-facing change?
No, this interface has not been released.
### How was this patch tested?
UT.
Closes#2644 from turboFei/celeborn_spi.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
as title
### Why are the changes needed?
Now, Celeborn doesn't support sinking shuffle data directly to Amazon S3, which could be a limitation when we're trying to move on-premises servers to AWS and use S3 as a data sink for shuffled data.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closes#2579 from zhaohehuhu/dev-0619.
Authored-by: zhaohehuhu <luoyedeyi@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>