### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
<img width="1370" height="1100" alt="image" src="https://github.com/user-attachments/assets/dce7f5b4-a166-4547-bc08-4a8162f129d7" />
Closes#3457 from cxzl25/CELEBORN-2135.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Support Flink 2.1.
### Why are the changes needed?
Flink 2.1 has already released, which release notes refer to [Release notes - Flink 2.1](https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-2.1).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#3404 from SteNicholas/CELEBORN-2093.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Bump ap-loader version from 3.0-9 to 4.0-10.
### Why are the changes needed?
`ap-loader` has already released v4.0-10, which release note refers to [Loader for 4.0 (v10): Heatmaps and Native memory profiling](https://github.com/jvm-profiling-tools/ap-loader/releases/tag/4.0-10). It should bump version from 3.0-9 to 4.0-10 for `JVMProfiler`.
Backport https://github.com/apache/spark/pull/51257.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#3359 from SteNicholas/CELEBORN-2057.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Bump spark 4.0 version to 4.0.0.
### Why are the changes needed?
Spark 4.0.0 is ready.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GA.
Closes#3282 from turboFei/spark_4.0.
Lead-authored-by: Fei Wang <fwang12@ebay.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Co-authored-by: Fei Wang <cn.feiwang@gmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
Config `celeborn.master.slot.assign.loadAware.fetchTimeWeight` default value is 1, and slotsallocation document is configured as 0.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA
Closes#3287 from cxzl25/minor_doc_slot.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
Use `tmp` subfolder for svc staging dir.
### Why are the changes needed?
Refer:
81c3d91f75/build/release/release.sh (L67)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local.
Closes#3278 from turboFei/release_guide_follow.
Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
### What changes were proposed in this pull request?
Add release guide and fix several issues during 0.6.0 release.
### Why are the changes needed?
Add docs.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested locally.
Closes#3271 from turboFei/release_guide.
Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Fei Wang <fwang12@ebay.com>
### What changes were proposed in this pull request?
Support Flink 2.0. The major changes of Flink 2.0 include:
- https://github.com/apache/flink/pull/25406: Bump target Java version to 11 and drop support for Java 8.
- https://github.com/apache/flink/pull/25551: Replace `InputGateDeploymentDescriptor#getConsumedSubpartitionIndexRange` with `InputGateDeploymentDescriptor#getConsumedSubpartitionRange(index)`.
- https://github.com/apache/flink/pull/25314: Replace `NettyShuffleEnvironmentOptions#NETWORK_EXCLUSIVE_BUFFERS_REQUEST_TIMEOUT_MILLISECONDS` with `NettyShuffleEnvironmentOptions#NETWORK_BUFFERS_REQUEST_TIMEOUT`.
- https://github.com/apache/flink/pull/25731: Introduce `InputGate#resumeGateConsumption`.
### Why are the changes needed?
Flink 2.0 is released which refers to [Release notes - Flink 2.0](https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-2.0).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#3179 from SteNicholas/CELEBORN-1925.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Weijie Guo <reswqa@163.com>
### What changes were proposed in this pull request?
Fix typo in 'Developer' documents.
### Why are the changes needed?
Improve the accurary of the doc.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Only doc changed. No test.
Closes#3108 from bgeng777/CELEBORN-1870.
Authored-by: KenGeng <samuelgeng7@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Bump ap-loader version from 3.0-8 to 3.0-9.
### Why are the changes needed?
ap-loader has already released v3.0-9, which should bump version from 3.0-8 for `JVMProfiler`.
Backport:
1. https://github.com/apache/spark/pull/46402
2. https://github.com/apache/spark/pull/49440
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#3072 from SteNicholas/CELEBORN-1842.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Remove out-of-dated flink 1.14 and 1.15.
For more information, please see the discussion thread: https://lists.apache.org/thread/njho00zmkjx5qspcrbrkogy8s4zzmwv9
### Why are the changes needed?
Reduce maintenance burden.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
Changes can be covered by existing tests.
Closes#3029 from codenohup/remove-flink14and15.
Authored-by: codenohup <huangxu.walker@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Introduction to Celeborn's Java Columnar Shuffle
### Why are the changes needed?
Introduction to Celeborn's Java Columnar Shuffle
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI
Closes#3010 from kerwin-zk/CELEBORN-1789.
Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove the code for app top disk usage both in master and worker end.
Prefer to use below prometheus expr to figure out the top app usages.
```
topk(50, sum by (applicationId) (metrics_diskBytesWritten_Value{role="worker", applicationId!=""}))
```
### Why are the changes needed?
To address comments: https://github.com/apache/celeborn/pull/2947#issuecomment-2499564978
> Due to the application dimension resource consumption, this feature should be included in the deprecated features. Maybe you can remove the codes for application top disk usage.
### Does this PR introduce _any_ user-facing change?
Yes, remove the app top disk usage api.
### How was this patch tested?
GA.
Closes#2949 from turboFei/remove_app_top_usage.
Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Document adds Flink 1.16 support including `README.md`, `deploy.md`.
2. Update description of `celeborn.client.shuffle.compression.codec` to change the supported Flink version for ZSTD.
### Why are the changes needed?
#2619 has supported Flink 1.16, which should update the document for the support. Meanwhile, since Flink version 1.16, zstd is supported for Flink shuffle client.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2904 from SteNicholas/CELEBORN-1504.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce Blaze support document.
### Why are the changes needed?
[Blaze](https://github.com/kwai/blaze) supports Celeborn as remote shuffle service. It's recommened to Blaze support document for introduction of Blaze usage.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2787 from SteNicholas/CELEBORN-1635.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Update document link of `Get Started With Velox` and `Get Started With ClickHouse` in `glutensupport.md`. Meanwhile, replace `gluten-celeborn-package-xx-SNAPSHOT.jar` with `(The bundled Gluten Jar. Make sure -Pceleborn is specified when it is built.)`, which refers to https://github.com/apache/incubator-gluten/pull/6692.
### Why are the changes needed?
The document link of `Get Started With Velox` and `Get Started With ClickHouse` could not access, which has already changed the url.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2762 from SteNicholas/CELEBORN-1486.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix the unique key to reflect correct columns names.
### Why are the changes needed?
Running current DB scripts give below error because `user` column was renamed to `name` (https://github.com/apache/celeborn/pull/2340) but the unique key was not updated correctly.
```
mysql> CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config
-> (
-> id int NOT NULL AUTO_INCREMENT,
-> cluster_id int NOT NULL,
-> tenant_id varchar(255) NOT NULL,
-> level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
-> name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
-> config_key varchar(255) NOT NULL,
-> config_value varchar(255) NOT NULL,
-> type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
-> gmt_create timestamp NOT NULL,
-> gmt_modify timestamp NOT NULL,
-> PRIMARY KEY (id),
-> UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `user`, `config_key`)
-> );
ERROR 1072 (42000): Key column 'user' doesn't exist in table
```
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
Tested in local DB
```
mysql> CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config
-> (
-> id int NOT NULL AUTO_INCREMENT,
-> cluster_id int NOT NULL,
-> tenant_id varchar(255) NOT NULL,
-> level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
-> name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
-> config_key varchar(255) NOT NULL,
-> config_value varchar(255) NOT NULL,
-> type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
-> gmt_create timestamp NOT NULL,
-> gmt_modify timestamp NOT NULL,
-> PRIMARY KEY (id),
-> UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `name`, `config_key`)
-> );
Query OK, 0 rows affected (0.01 sec)
```
Closes#2740 from s0nskar/fix-db-script.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Adding support of providing custom dynamic store backend implementation, users can now pass there own implementation for dynamic config store backend.
This change also keep the backwards compatibility of supporting short names for backend like "FS" and "DB"
### Why are the changes needed?
Currently celeborn only supports File and DB based backend while there can be other ways of managing these configs.
### Does this PR introduce _any_ user-facing change?
NO, user facing behaviour will be same.
### How was this patch tested?
Existing UTs verifies that this change is working for "FS" and "DB" implementation.
Closes#2670 from s0nskar/dynamic_config.
Authored-by: Sanskar Modi <sanskarmodi97@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
1.20 was the last non-bug-fix release before Flink 2.0, you can found all main upgrade features in this [release note](https://nightlies.apache.org/flink/flink-docs-release-1.20/release-notes/flink-1.20/). I think the most important feature related to Celeborn is we expose some interface to support Flink hybrid shuffle integration with Celeborn([FLIP-459](https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn)). This(supporting hybrid shuffle in Celeborn side) is also a follow-up stuff to this PR.
incompatible changes in 1.20:
- 1.20 use enum `CompressionCodec` instead of `String` to construct `BufferDecompressor` and `BufferCompressor`.
- 1.20 introduce a new method(`notifyPartitionRecoveryStarted`) to `JobShuffleContext` in a non-compatible way.
I've already done the adaptation in this PR.
Closes#2662 from reswqa/support-120.
Authored-by: Weijie Guo <reswqa@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Add support for Apache Flink 1.16 in Celeborn.
### Why are the changes needed?
User requests for Apache Flink 1.16.
This implementation is a synthesis of 1.15 and 1.17 support which already exists in Apache Celeborn
### Does this PR introduce _any_ user-facing change?
Yes, supports Apache Flink 1.16
### How was this patch tested?
Tests for 1.16 added, which are based on 1.15 and 1.17
Closes#2619 from mridulm/flink-1.16-support.
Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `ClickHouse Backend` in `Gluten Support` document. Meanwhile, fix the profile via `-Pceleborn` to compile gluten module.
### Why are the changes needed?
Gluten with ClickHouse backend supports Celeborn as remote shuffle service at present. Gluten Support document should introduce ClickHouse Backend to guide user usage of Gluten with ClickHouse backend.
Backport https://github.com/apache/incubator-gluten/pull/6282.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2594 from SteNicholas/CELEBORN-1486.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
### What changes were proposed in this pull request?
Add helm chart unit tests.
### Why are the changes needed?
Unit tests can make resource manifests are rendered as expected with various configurations.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Detailed information about how to run helm chart unit tests can be found here [helm-unittest/helm-unittest](https://github.com/helm-unittest/helm-unittest). First, you need to install helm unit test plugin:
```shell
helm plugin install https://github.com/helm-unittest/helm-unittest.git
```
Then, run helm chart unitt tests as follows:
```shell
$ helm unittest charts/celeborn --file "tests/**/*_test.yaml" --strict --debug
load_plugins.go:110: [info] file (/Users/chenyi/Library/helm/plugins/helm-acr/completion.yaml) not provided by plugin. No plugin auto-completion possible
### Chart [ celeborn ] charts/celeborn
PASS Test Celeborn configmap charts/celeborn/tests/configmap_test.yaml
PASS Test Celeborn master pod monitor charts/celeborn/tests/master/podmonitor_test.yaml
PASS Test Celeborn master priority class charts/celeborn/tests/master/priorityclass_test.yaml
PASS Test Celeborn master service charts/celeborn/tests/master/service_test.yaml
PASS Test Celeborn master statefulset charts/celeborn/tests/master/statefulset_test.yaml
PASS Test Celeborn worker pod monitor charts/celeborn/tests/worker/podmonitor_test.yaml
PASS Test Celeborn worker priority class charts/celeborn/tests/worker/priorityclass_test.yaml
PASS Test Celeborn worker service charts/celeborn/tests/worker/service_test.yaml
PASS Test Celeborn worker statefulset charts/celeborn/tests/worker/statefulset_test.yaml
Charts: 1 passed, 1 total
Test Suites: 9 passed, 9 total
Tests: 48 passed, 48 total
Snapshot: 0 passed, 0 total
Time: 183.011375ms
```
Closes#2511 from ChenYi015/helm-unittest.
Authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Remove incubator/incubating for graduation including:
- Remove `incubator`/`Incubating`.
- Remove `DISCLAIMER` and corresponding link.
- Update Release scripts and template.
Fix#2415.
### Why are the changes needed?
The ASF board has approved a resolution to graduate Celeborn into a full Top Level Project. To transition from the Apache Incubator to a new TLP, there's a few action items we need to do to complete the transition.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2421 from SteNicholas/infra-graduation.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce JVM profiling `JVMProfier` in Celeborn Worker using async-profiler to capture CPU and memory profiles.
### Why are the changes needed?
[async-profiler](https://github.com/async-profiler) is a sampling profiler for any JDK based on the HotSpot JVM that does not suffer from Safepoint bias problem. It has low overhead and doesn’t rely on JVMTI. It avoids the safepoint bias problem by using the `AsyncGetCallTrace` API provided by HotSpot JVM to profile the Java code paths, and Linux’s perf_events to profile the native code paths. It features HotSpot-specific APIs to collect stack traces and to track memory allocations.
The feature introduces a profier plugin that does not add any overhead unless enabled and can be configured to accept profiler arguments as a configuration parameter. It should support to turn profiling on/off, includes the jar/binaries needed for profiling.
Backport [[SPARK-46094] Support Executor JVM Profiling](https://github.com/apache/spark/pull/44021).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Worker cluster test.
Closes#2409 from SteNicholas/CELEBORN-1299.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve Celeborn document to fix typos, formats, unvalid link and unsynced default value of document. Meanwhile, the public interfaces of `shuffleclient.md` keep the consistent with `ShuffleClient`.
### Why are the changes needed?
There are some typos, formats, unvalid link and unsynced default value fixes in Celeborn document at present. Meanwhile, the public interfaces of `shuffleclient.md` is inconsistent with `ShuffleClient`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2410 from SteNicholas/CELEBORN-1341.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve Celeborn document to fix typos, table formats and wrong description of document. Meanwhile, `deploy.md` adds the document of MapReduce client deployment.
### Why are the changes needed?
There are some typos and format fixes in Celeborn document at present. Meanwhile, the `deploy.md` does not contain the deployment of MapReduce client, which is inconsistent with `README.md` for Flink configuration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2407 from SteNicholas/CELEBORN-1341.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Flink 1.19.
### Why are the changes needed?
Flink 1.19.0 is announced to release: [Announcing the Release of Apache Flink 1.19] (https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19).
The main changes includes:
- `org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel` constructor change parameters:
- `consumedSubpartitionIndex` changes to `consumedSubpartitionIndexSet`: [[FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel](https://github.com/apache/flink/pull/23927).
- adds `partitionRequestListenerTimeout`: [[FLINK-25055][network] Support listen and notify mechanism for partition request](https://github.com/apache/flink/pull/23565).
- `org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor removes parameters `subpartitionIndexRange`, `tieredStorageConsumerClient`, `nettyService` and `tieredStorageConsumerSpecs`: [[FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel](https://github.com/apache/flink/pull/23927).
- Change the default config file to `config.yaml` in `flink-dist`: [[FLINK-33577][dist] Change the default config file to config.yaml in flink-dist](https://github.com/apache/flink/pull/24177).
- `org.apache.flink.configuration.RestartStrategyOptions` uses `org.apache.commons.compress.utils.Sets` of `commons-compress` dependency: [[FLINK-33865][runtime] Adding an ITCase to ensure exponential-delay.attempts-before-reset-backoff works well](https://github.com/apache/flink/pull/23942).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test:
- Flink batch job submission
```
$ ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID 2e9fb659991a9c29d376151783bdf6de
Program execution finished
Job with JobID 2e9fb659991a9c29d376151783bdf6de has finished.
Job Runtime: 1912 ms
```
- Flink batch job execution

- Celeborn master log
```
24/03/18 20:52:47,513 INFO [celeborn-dispatcher-42] Master: Offer slots successfully for 1 reducers of 1710766312631-2e9fb659991a9c29d376151783bdf6de-0 on 1 workers.
```
- Celeborn worker log
```
24/03/18 20:52:47,704 INFO [celeborn-dispatcher-1] StorageManager: created file at /Users/nicholas/Software/Celeborn/apache-celeborn-0.5.0-SNAPSHOT/shuffle/celeborn-worker/shuffle_data/1710766312631-2e9fb659991a9c29d376151783bdf6de/0/0-0-0
24/03/18 20:52:47,707 INFO [celeborn-dispatcher-1] Controller: Reserved 1 primary location and 0 replica location for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0
24/03/18 20:52:47,874 INFO [celeborn-dispatcher-2] Controller: Start commitFiles for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0
24/03/18 20:52:47,890 INFO [worker-rpc-async-replier] Controller: CommitFiles for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0 success with 1 committed primary partitions, 0 empty primary partitions, 0 failed primary partitions, 0 committed replica partitions, 0 empty replica partitions, 0 failed replica partitions.
```
Closes#2399 from SteNicholas/CELEBORN-1310.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
To fix a typo.
### Why are the changes needed?
To maintain the quality of Celeborn documentation.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
N/A
Closes#2397 from ForVic/forvic/fix_typo.
Authored-by: ForVic <victor.lakers0@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix style and Gluten link in Developers Doc.
### Why are the changes needed?
- `slotsallocation.md` has the following wrong style:
<img width="1434" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/97fb53ed-473d-4f3d-8231-1fb613df9132">
- Gluten is apache incubating projetc, of which the link of Gluten project should be [Gluten](https://github.com/apache/incubator-gluten).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2375 from SteNicholas/developers-doc.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Spark2.4 with Scala2.12 in `sbt.md`. Meanwhile, the CI workflow adds the test for Spark2.4 and Scala2.12.
Follow up #2344.
### Why are the changes needed?
Spark2.4 with Scala2.12 is supported.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2345 from SteNicholas/CELEBORN-1298.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Add trace mark symbol.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2338 from FMX/B1295.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `configuration.md` to document dynamic config and config service.
### Why are the changes needed?
`DynamicConfig` and `ConfigService` have already been supported in #2100, which should be documented to introduce the feature.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2336 from SteNicholas/CELEBORN-1286.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2201 from cxzl25/CELEBORN-1207.
Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: cxzl25 <3898450+cxzl25@users.noreply.github.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
1. Remove UNKNOWN_DISK from StorageInfo.
2. Enable load-aware slots allocation when there is HDFS.
### Why are the changes needed?
To support the application's config about available storage types.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
GA and Cluster.
Closes#2098 from FMX/B1081-1.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
add --release parameter to create a Celeborn distribution like those distributed by the Celeborn Downloads page
### Why are the changes needed?
Without --release parameter, the created Celeborn distribution is different from the Celeborn Downloads page and lacks client-related packages.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
PASS GA
Closes#2080 from jiaoqingbo/minor-sbt.
Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```
```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```
Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers
`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```
<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">
Closes#2063 from cxzl25/CELEBORN-1105.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2062 from cxzl25/CELEBORN-1104.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
`README#Build` and `sbt#System Requirements` extends to Scala 2.13.
### Why are the changes needed?
`README#Build` and `sbt#System Requirements`should extend to Scala 2.13 to align the SBT CI test results.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
SBT CI tests.
Closes#1987 from SteNicholas/CELEBORN-987.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
Fix some typos
### Why are the changes needed?
Ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
-
Closes#1983 from onebox-li/fix-typo.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
To clarify a spark config to work with Celeborn.
### Why are the changes needed?
After some tests, I found that Spark 3.1 and newer can work with Celeborn with `spark.shuffle.service.enabled=true`.
ExternalShuffleBlockResolver won't check the shuffle manager's type since Spark 3.1 and newer.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
I tested two scenarios about this PR.
1. Check whether Spark can release the executors in time.
2. Check data correctness by running TPC-DS.
All checks are good.
Closes#1955 from FMX/CELEBORN-1010.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
<!--
Thanks for sending a pull request! Here are some tips for you:
- Make sure the PR title start w/ a JIRA ticket, e.g. '[CELEBORN-XXXX] Your PR title ...'.
- Be sure to keep the PR description updated to reflect all changes.
- Please write your PR title to summarize what this PR proposes.
- If possible, provide a concise example to reproduce the issue for a faster review.
-->
### What changes were proposed in this pull request?
As Title
### Why are the changes needed?
Since 0.3.1, Celeborn changed the default value of `celeborn.worker.directMemoryRatioToResume` from `0.5` to `0.7`.
the doc should be update
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
PASS GA
Closes#1931 from jiaoqingbo/ratiofix.
Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
https://celeborn.apache.org/docs/latest/developers/overview/
> For more details, please refer to Rolling upgrade
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1927 from cxzl25/CELEBORN-997.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix a broken link in docs/developers/overview.md.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Locally tested.
Closes#1845 from zhouyifan279/upgrade-page-link.
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1806 from cfmcgrady/sbt-docs-followup.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual test
Closes#1795 from cfmcgrady/sbt-docs.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1784 from kerwin-zk/gluten_celeborn.
Lead-authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Co-authored-by: Kerwin Zhang <xiyu.zk@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1788 from waitinfuture/869-fu.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>