### What changes were proposed in this pull request?
Improve Celeborn document to fix typos, table formats and wrong description of document. Meanwhile, `deploy.md` adds the document of MapReduce client deployment.
### Why are the changes needed?
There are some typos and format fixes in Celeborn document at present. Meanwhile, the `deploy.md` does not contain the deployment of MapReduce client, which is inconsistent with `README.md` for Flink configuration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2407 from SteNicholas/CELEBORN-1341.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Flink 1.19.
### Why are the changes needed?
Flink 1.19.0 is announced to release: [Announcing the Release of Apache Flink 1.19] (https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19).
The main changes includes:
- `org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel` constructor change parameters:
- `consumedSubpartitionIndex` changes to `consumedSubpartitionIndexSet`: [[FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel](https://github.com/apache/flink/pull/23927).
- adds `partitionRequestListenerTimeout`: [[FLINK-25055][network] Support listen and notify mechanism for partition request](https://github.com/apache/flink/pull/23565).
- `org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor removes parameters `subpartitionIndexRange`, `tieredStorageConsumerClient`, `nettyService` and `tieredStorageConsumerSpecs`: [[FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel](https://github.com/apache/flink/pull/23927).
- Change the default config file to `config.yaml` in `flink-dist`: [[FLINK-33577][dist] Change the default config file to config.yaml in flink-dist](https://github.com/apache/flink/pull/24177).
- `org.apache.flink.configuration.RestartStrategyOptions` uses `org.apache.commons.compress.utils.Sets` of `commons-compress` dependency: [[FLINK-33865][runtime] Adding an ITCase to ensure exponential-delay.attempts-before-reset-backoff works well](https://github.com/apache/flink/pull/23942).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test:
- Flink batch job submission
```
$ ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID 2e9fb659991a9c29d376151783bdf6de
Program execution finished
Job with JobID 2e9fb659991a9c29d376151783bdf6de has finished.
Job Runtime: 1912 ms
```
- Flink batch job execution

- Celeborn master log
```
24/03/18 20:52:47,513 INFO [celeborn-dispatcher-42] Master: Offer slots successfully for 1 reducers of 1710766312631-2e9fb659991a9c29d376151783bdf6de-0 on 1 workers.
```
- Celeborn worker log
```
24/03/18 20:52:47,704 INFO [celeborn-dispatcher-1] StorageManager: created file at /Users/nicholas/Software/Celeborn/apache-celeborn-0.5.0-SNAPSHOT/shuffle/celeborn-worker/shuffle_data/1710766312631-2e9fb659991a9c29d376151783bdf6de/0/0-0-0
24/03/18 20:52:47,707 INFO [celeborn-dispatcher-1] Controller: Reserved 1 primary location and 0 replica location for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0
24/03/18 20:52:47,874 INFO [celeborn-dispatcher-2] Controller: Start commitFiles for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0
24/03/18 20:52:47,890 INFO [worker-rpc-async-replier] Controller: CommitFiles for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0 success with 1 committed primary partitions, 0 empty primary partitions, 0 failed primary partitions, 0 committed replica partitions, 0 empty replica partitions, 0 failed replica partitions.
```
Closes#2399 from SteNicholas/CELEBORN-1310.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
To fix a typo.
### Why are the changes needed?
To maintain the quality of Celeborn documentation.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
N/A
Closes#2397 from ForVic/forvic/fix_typo.
Authored-by: ForVic <victor.lakers0@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix style and Gluten link in Developers Doc.
### Why are the changes needed?
- `slotsallocation.md` has the following wrong style:
<img width="1434" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/97fb53ed-473d-4f3d-8231-1fb613df9132">
- Gluten is apache incubating projetc, of which the link of Gluten project should be [Gluten](https://github.com/apache/incubator-gluten).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2375 from SteNicholas/developers-doc.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Spark2.4 with Scala2.12 in `sbt.md`. Meanwhile, the CI workflow adds the test for Spark2.4 and Scala2.12.
Follow up #2344.
### Why are the changes needed?
Spark2.4 with Scala2.12 is supported.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2345 from SteNicholas/CELEBORN-1298.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Add trace mark symbol.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2338 from FMX/B1295.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `configuration.md` to document dynamic config and config service.
### Why are the changes needed?
`DynamicConfig` and `ConfigService` have already been supported in #2100, which should be documented to introduce the feature.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2336 from SteNicholas/CELEBORN-1286.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2201 from cxzl25/CELEBORN-1207.
Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: cxzl25 <3898450+cxzl25@users.noreply.github.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
1. Remove UNKNOWN_DISK from StorageInfo.
2. Enable load-aware slots allocation when there is HDFS.
### Why are the changes needed?
To support the application's config about available storage types.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
GA and Cluster.
Closes#2098 from FMX/B1081-1.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
add --release parameter to create a Celeborn distribution like those distributed by the Celeborn Downloads page
### Why are the changes needed?
Without --release parameter, the created Celeborn distribution is different from the Celeborn Downloads page and lacks client-related packages.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
PASS GA
Closes#2080 from jiaoqingbo/minor-sbt.
Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
```bash
flink-1.18.0
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
```
```java
Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V
at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225)
at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179)
at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90)
at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120)
at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524)
at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496)
```
Flink 1.18.0 release
https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method.
[[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers
`org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters.
[[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate
[[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```bash
flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a
Program execution finished
Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished.
Job Runtime: 1635 ms
```
<img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99">
Closes#2063 from cxzl25/CELEBORN-1105.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2062 from cxzl25/CELEBORN-1104.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
`README#Build` and `sbt#System Requirements` extends to Scala 2.13.
### Why are the changes needed?
`README#Build` and `sbt#System Requirements`should extend to Scala 2.13 to align the SBT CI test results.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
SBT CI tests.
Closes#1987 from SteNicholas/CELEBORN-987.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Fu Chen <cfmcgrady@gmail.com>
### What changes were proposed in this pull request?
Fix some typos
### Why are the changes needed?
Ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
-
Closes#1983 from onebox-li/fix-typo.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
To clarify a spark config to work with Celeborn.
### Why are the changes needed?
After some tests, I found that Spark 3.1 and newer can work with Celeborn with `spark.shuffle.service.enabled=true`.
ExternalShuffleBlockResolver won't check the shuffle manager's type since Spark 3.1 and newer.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
I tested two scenarios about this PR.
1. Check whether Spark can release the executors in time.
2. Check data correctness by running TPC-DS.
All checks are good.
Closes#1955 from FMX/CELEBORN-1010.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
<!--
Thanks for sending a pull request! Here are some tips for you:
- Make sure the PR title start w/ a JIRA ticket, e.g. '[CELEBORN-XXXX] Your PR title ...'.
- Be sure to keep the PR description updated to reflect all changes.
- Please write your PR title to summarize what this PR proposes.
- If possible, provide a concise example to reproduce the issue for a faster review.
-->
### What changes were proposed in this pull request?
As Title
### Why are the changes needed?
Since 0.3.1, Celeborn changed the default value of `celeborn.worker.directMemoryRatioToResume` from `0.5` to `0.7`.
the doc should be update
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
PASS GA
Closes#1931 from jiaoqingbo/ratiofix.
Authored-by: jiaoqingbo <1178404354@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
https://celeborn.apache.org/docs/latest/developers/overview/
> For more details, please refer to Rolling upgrade
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1927 from cxzl25/CELEBORN-997.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix a broken link in docs/developers/overview.md.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Locally tested.
Closes#1845 from zhouyifan279/upgrade-page-link.
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1806 from cfmcgrady/sbt-docs-followup.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual test
Closes#1795 from cfmcgrady/sbt-docs.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1784 from kerwin-zk/gluten_celeborn.
Lead-authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Co-authored-by: Kerwin Zhang <xiyu.zk@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1788 from waitinfuture/869-fu.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1787 from waitinfuture/869.
Lead-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1782 from waitinfuture/864.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1778 from waitinfuture/860-1.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1775 from waitinfuture/853.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1769 from waitinfuture/834.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1772 from waitinfuture/849.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1754 from waitinfuture/831.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1752 from waitinfuture/826.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1747 from waitinfuture/824.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1746 from waitinfuture/823.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>