### What changes were proposed in this pull request?
Bump guava from 32.1.3-jre to 33.1.0-jre.
### Why are the changes needed?
Guava v33.1.0 has been released, which release note refers to [v33.1.0](https://github.com/google/guava/releases/tag/v33.1.0). v33.1.0 brings some bug fixes and optimizations as follows:
* cache: Fixed a bug that could cause https://github.com/google/guava/pull/6851#issuecomment-1931276822 for `CacheLoader`/`CacheBuilder`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2439 from SteNicholas/CELEBORN-1366.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Update infrastructure for SSL support.
Please see #2416 for the consolidated PR with all the changes for reference.
### Why are the changes needed?
At a high level, the changes are:
* `ManagedBuffer.convertToNettyForSsl`, to support SSL encryption.
* Add `EncryptedMessageWithHeader`, which is used to wrap the message and body, for use with SSL.
* `SslMessageEncoder` is an encoder for SSL
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
The overall PR #2416 (and this PR as well) passes all tests, and this PR includes relevant subset of tests.
Closes#2427 from mridulm/update-infra-for-ssl.
Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
`ServerConnector` supports `celeborn.master.http.stopTimeout` and `celeborn.worker.http.stopTimeout`.
### Why are the changes needed?
Jetty `Server` supports `celeborn.master.http.stopTimeout` and `celeborn.worker.http.stopTimeout`, but `ServerConnector` does not support, which default stop timeout is 5min.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test.
Closes#2437 from SteNicholas/CELEBORN-1317.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
`AbstractRemoteShuffleInputGateFactory` supports `celeborn.client.shuffle.compression.codec` to configure compression codec.
### Why are the changes needed?
`AbstractRemoteShuffleInputGateFactory` only supports LZ4 compression codec via hard code at present. `AbstractRemoteShuffleInputGateFactory` should support `celeborn.client.shuffle.compression.codec` to configure compression codec like ZSTD etc.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2435 from SteNicholas/CELEBORN-1363.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove unnecessary configuration `celeborn.client.flink.inputGate.minMemory` and `celeborn.client.flink.resultPartition.minMemory`.
### Why are the changes needed?
`celeborn.client.flink.inputGate.minMemory` and `celeborn.client.flink.resultPartition.minMemory` are configured as min memory reserved at present. Meanwhile, `celeborn.client.flink.inputGate.memory` should be at least `networkBufferSize * MIN_BUFFERS_PER_GATE` bytes, and `celeborn.client.flink.resultPartition.memory` should be at least `networkBufferSize * MIN_BUFFERS_PER_PARTITION` bytes. Therefore, `celeborn.client.flink.inputGate.minMemory` and `celeborn.client.flink.resultPartition.minMemory` are unnecessary configuration for `celeborn.client.flink.inputGate.memory` and `celeborn.client.flink.resultPartition.memory`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`PluginSideConfSuiteJ#testCoalesce`
Closes#2433 from SteNicholas/CELEBORN-1362.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. `NettyLogger` is introduced as a new class which is able to construct a log handler depending on the log level:
- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">`: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient.
- In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">`: Netty's own log handler is used which dumps the message contents.
- Otherwise (when the logger is not `TRACE` or `DEBUG`) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect).
Backport:
- [[SPARK-36719][CORE] Supporting Netty Logging at the network layer](https://github.com/apache/spark/pull/33962)
- [[SPARK-45377][CORE] Handle InputStream in NettyLogger](https://github.com/apache/spark/pull/43165)
### Why are the changes needed?
This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local manually test.
Closes#2423 from SteNicholas/CELEBORN-1359.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove `Incubating` from REST API Documentation.
### Why are the changes needed?
The ASF board has approved a resolution to graduate Celeborn into a full Top Level Project. The REST API Documentation should remove `Incubating`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2425 from SteNicholas/CELEBORN-1317.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
To fix the UT for http server port already in use issue.
For Jetty HttpServer, if failed to bind port, the exception is IOException and the cause is BindException, we should retry for that.
Before:
```
case e: BindException => // retry to setup mini cluster
```
Now:
```
case e: IOException
if e.isInstanceOf[BindException] || Option(e.getCause).exists(
_.isInstanceOf[BindException]) => // retry to setup mini cluster
```
### Why are the changes needed?
To fix the UT for http server port already in use issue.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Will trigger GA for 3 three times.
Closes#2424 from turboFei/set_connector_stop_timeout.
Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Before, there is no http request spec likes query param, http method and response mediaType.
And for each api, a HttpEndpoint class is needed.
In this PR, we refine the code for http service and provide swagger ui.
Note that: This pr does not change the orignal api request and response behavior, including metrics APIs.
TODO:
1. define DTO
2. http request authentication
<img width="1900" alt="image" src="https://github.com/apache/incubator-celeborn/assets/6757692/7f8c2363-170d-4bdf-b2c9-74260e31d3e5">
<img width="1138" alt="image" src="https://github.com/apache/incubator-celeborn/assets/6757692/3ae6ec8e-00a8-475b-bb37-0329536185f6">
### Why are the changes needed?
To close CELEBORN-1317
### Does this PR introduce _any_ user-facing change?
The api is align with before.
### How was this patch tested?
UT.
Closes#2371 from turboFei/jetty.
Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix `copyright` of `mkdocs.yml` for graduation.
### Why are the changes needed?
Follow https://github.com/apache/celeborn-website/pull/44#discussion_r1540556944.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2422 from SteNicholas/infra-graduation.
Lead-authored-by: SteNicholas <programgeek@163.com>
Co-authored-by: Nicholas Jiang <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Compile Spark-3.5 with
`./build/make-distribution.sh -Pspark-3.5 -Pjdk-21`
or
`./build/make-distribution.sh --sbt-enabled -Pspark-3.5 -Pjdk-21`
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
manual tests
Closes#2385 from waitinfuture/1327.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Add SSL related configs and support for `ReloadingX509TrustManager`, required for enabling SSL support.
Please see #2416 for the consolidated PR with all the changes for reference.
Introduces SSL related configs for enabling and configuring use of TLS.
Yes, introduces configs to control behavior of SSL
The overall PR #2411 (and this PR as well) passes all tests, this is specifically pulling out the `ReloadingX509TrustManager` and config related changes
Closes#2419 from mridulm/config-for-ssl.
Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove incubator/incubating for graduation including:
- Remove `incubator`/`Incubating`.
- Remove `DISCLAIMER` and corresponding link.
- Update Release scripts and template.
Fix#2415.
### Why are the changes needed?
The ASF board has approved a resolution to graduate Celeborn into a full Top Level Project. To transition from the Apache Incubator to a new TLP, there's a few action items we need to do to complete the transition.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2421 from SteNicholas/infra-graduation.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Build changes and test resources for enabling SSL support.
Please see #2416 for the consolidate PR with all the changes for reference.
Note: I closed the older PR #2413 and reopened this one give the repo changes.
### Why are the changes needed?
Build dependency updates and addition of test resources for use with tests.
The specific tests leveraging these will be added in subsequent jiras linked off of CELEBORN-1343
Splitting it up into multiple PR's to reduce the review load.
### Does this PR introduce _any_ user-facing change?
io.netty:netty-tcnative-boringssl-static is an additional dependency.
org.bouncycastle:* are test dependencies which should have no user facing changes.
### How was this patch tested?
The overall PR #2411 passes all tests, this is specifically pulling out the dependency changes and resources.
Closes#2417 from mridulm/build-and-test-for-tls.
Lead-authored-by: Mridul Muralidharan <mridul@gmail.com>
Co-authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Batch OpenStream RPCs by Worker to avoid too many RPCs.
### Why are the changes needed?
ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Passes GA and Manual tests.
Closes#2362 from waitinfuture/1144.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
`AbstractRemoteShuffleResultPartitionFactory` removes the check of shuffle compression codec.
### Why are the changes needed?
`AbstractRemoteShuffleResultPartitionFactory` checks whether shuffle compression codec is LZ4 for Flink 1.14 and 1.15 version at present. Meanwhile, since Flink 1.17 version, ZSTD has already been supported. Therefore `AbstractRemoteShuffleResultPartitionFactory` should remove the check of shuffle compression codec for Flink 1.17 version and above, which is checked via the constructor of `BufferCompressor`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- `RemoteShuffleResultPartitionFactorySuiteJ`
Closes#2414 from SteNicholas/CELEBORN-1357.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Currently, the Celeborn master calculates the estimatedPartitionSize based on the fileInfo committed by the application. This estimate is then used to allocate slots across all workers. However, this partition size may be too large or too small for Celeborn. For example, if an application commits a single file of 1TB to only one worker, using that partition size could result in all other workers having no available slots or only very small slots. To improve this, it would be better to implement a cap on the master's estimated partition size to prevent such imbalances.
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
Closes#2412 from RexXiong/CELEBORN-1345.
Lead-authored-by: lvshuang.xjs <lvshuang.xjs@taobao.com>
Co-authored-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce JVM profiling `JVMProfier` in Celeborn Worker using async-profiler to capture CPU and memory profiles.
### Why are the changes needed?
[async-profiler](https://github.com/async-profiler) is a sampling profiler for any JDK based on the HotSpot JVM that does not suffer from Safepoint bias problem. It has low overhead and doesn’t rely on JVMTI. It avoids the safepoint bias problem by using the `AsyncGetCallTrace` API provided by HotSpot JVM to profile the Java code paths, and Linux’s perf_events to profile the native code paths. It features HotSpot-specific APIs to collect stack traces and to track memory allocations.
The feature introduces a profier plugin that does not add any overhead unless enabled and can be configured to accept profiler arguments as a configuration parameter. It should support to turn profiling on/off, includes the jar/binaries needed for profiling.
Backport [[SPARK-46094] Support Executor JVM Profiling](https://github.com/apache/spark/pull/44021).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Worker cluster test.
Closes#2409 from SteNicholas/CELEBORN-1299.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
add a new parameter to cap the max memory can be used for sort writer buffer
### Why are the changes needed?
with a huge number of partitions, the threshold based on buffer size * number of partitions without this cap can be too large, e.g. 64K * 100000 = 6G
### Does this PR introduce _any_ user-facing change?
a new parameter
### How was this patch tested?
ut
Closes#2388 from CodingCat/adaptive_followup.
Lead-authored-by: CodingCat <zhunansjtu@gmail.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve Celeborn document to fix typos, formats, unvalid link and unsynced default value of document. Meanwhile, the public interfaces of `shuffleclient.md` keep the consistent with `ShuffleClient`.
### Why are the changes needed?
There are some typos, formats, unvalid link and unsynced default value fixes in Celeborn document at present. Meanwhile, the public interfaces of `shuffleclient.md` is inconsistent with `ShuffleClient`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2410 from SteNicholas/CELEBORN-1341.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Unifiy license format of `pom.xml`.
### Why are the changes needed?
There are different license formats among modules, which standard license format has indent before `~`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2408 from SteNicholas/maven-license-format.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve Celeborn document to fix typos, table formats and wrong description of document. Meanwhile, `deploy.md` adds the document of MapReduce client deployment.
### Why are the changes needed?
There are some typos and format fixes in Celeborn document at present. Meanwhile, the `deploy.md` does not contain the deployment of MapReduce client, which is inconsistent with `README.md` for Flink configuration.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2407 from SteNicholas/CELEBORN-1341.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Support Flink 1.19.
### Why are the changes needed?
Flink 1.19.0 is announced to release: [Announcing the Release of Apache Flink 1.19] (https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19).
The main changes includes:
- `org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel` constructor change parameters:
- `consumedSubpartitionIndex` changes to `consumedSubpartitionIndexSet`: [[FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel](https://github.com/apache/flink/pull/23927).
- adds `partitionRequestListenerTimeout`: [[FLINK-25055][network] Support listen and notify mechanism for partition request](https://github.com/apache/flink/pull/23565).
- `org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor removes parameters `subpartitionIndexRange`, `tieredStorageConsumerClient`, `nettyService` and `tieredStorageConsumerSpecs`: [[FLINK-33743][runtime] Support consuming multiple subpartitions on a single channel](https://github.com/apache/flink/pull/23927).
- Change the default config file to `config.yaml` in `flink-dist`: [[FLINK-33577][dist] Change the default config file to config.yaml in flink-dist](https://github.com/apache/flink/pull/24177).
- `org.apache.flink.configuration.RestartStrategyOptions` uses `org.apache.commons.compress.utils.Sets` of `commons-compress` dependency: [[FLINK-33865][runtime] Adding an ITCase to ensure exponential-delay.attempts-before-reset-backoff works well](https://github.com/apache/flink/pull/23942).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test:
- Flink batch job submission
```
$ ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH
Executing example with default input data.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID 2e9fb659991a9c29d376151783bdf6de
Program execution finished
Job with JobID 2e9fb659991a9c29d376151783bdf6de has finished.
Job Runtime: 1912 ms
```
- Flink batch job execution

- Celeborn master log
```
24/03/18 20:52:47,513 INFO [celeborn-dispatcher-42] Master: Offer slots successfully for 1 reducers of 1710766312631-2e9fb659991a9c29d376151783bdf6de-0 on 1 workers.
```
- Celeborn worker log
```
24/03/18 20:52:47,704 INFO [celeborn-dispatcher-1] StorageManager: created file at /Users/nicholas/Software/Celeborn/apache-celeborn-0.5.0-SNAPSHOT/shuffle/celeborn-worker/shuffle_data/1710766312631-2e9fb659991a9c29d376151783bdf6de/0/0-0-0
24/03/18 20:52:47,707 INFO [celeborn-dispatcher-1] Controller: Reserved 1 primary location and 0 replica location for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0
24/03/18 20:52:47,874 INFO [celeborn-dispatcher-2] Controller: Start commitFiles for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0
24/03/18 20:52:47,890 INFO [worker-rpc-async-replier] Controller: CommitFiles for 1710766312631-2e9fb659991a9c29d376151783bdf6de-0 success with 1 committed primary partitions, 0 empty primary partitions, 0 failed primary partitions, 0 committed replica partitions, 0 empty replica partitions, 0 failed replica partitions.
```
Closes#2399 from SteNicholas/CELEBORN-1310.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Importing details from https://github.com/apache/spark/pull/43162:
--
This PR avoids a race condition where a connection which is in the process of being closed could be returned by the TransportClientFactory only to be immediately closed and cause errors upon use.
This race condition is rare and not easily triggered, but with the upcoming changes to introduce SSL connection support, connection closing can take just a slight bit longer and it's much easier to trigger this issue.
Looking at the history of the code I believe this was an oversight in https://github.com/apache/spark/pull/9853.
--
### Why are the changes needed?
We are working towards adding TLS support, which is essentially based on Spark 4.0 TLS support, and this is one of the fixes from there.
(I am yet to file the overall TLS support jira yet, but this is enabling work).
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests
Closes#2400 from mridulm/add-SPARK-45375.
Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2406 from cxzl25/TransportClient_typo.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Unify application module naming.
### Why are the changes needed?
Unify application module naming.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test.
Closes#2405 from miyuesc/fix-naming.
Authored-by: MiyueSC <913784771@qq.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Unify application module naming.
### Why are the changes needed?
Keep the file name style of each module unified.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test.
Closes#2403 from miyuesc/CELEBORN-1305-followup.
Authored-by: MiyueSC <913784771@qq.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
### Why are the changes needed?
`CELEBORN-1320` uses `ReviveManager` to batch processing SOFT_SPLIT event RPC, so `partitionSplitPool` is no longer used, and the configuration item `celeborn.client.push.splitPartition.threads` is meaningless.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#2396 from cxzl25/CELEBORN-1336.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
To fix a typo.
### Why are the changes needed?
To maintain the quality of Celeborn documentation.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
N/A
Closes#2397 from ForVic/forvic/fix_typo.
Authored-by: ForVic <victor.lakers0@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Bump `rocksdbjni` version from 8.5.3 to 8.11.3.
### Why are the changes needed?
The new version bring some bug fixes:
- Fix a corner case with auto_readahead_size where Prev Operation returns NOT SUPPORTED error when scans direction is changed from forward to backward.
- Avoid destroying the periodic task scheduler's default timer in order to prevent static destruction order issues.
- Fix double counting of BYTES_WRITTEN ticker when doing writes with transactions.
- Fix a WRITE_STALL counter that was reporting wrong value in few cases.
- A lookup by MultiGet in a TieredCache that goes to the local flash cache and finishes with very low latency, i.e before the subsequent call to WaitAll, is ignored, resulting in a false negative and a memory leak.
- Fix bug in auto_readahead_size that combined with IndexType::kBinarySearchWithFirstKey + fails or iterator lands at a wrong key
- Fixed some cases in which DB file corruption was detected but ignored on creating a backup with BackupEngine.
- Fix bugs where rocksdb.blobdb.blob.file.synced includes blob files failed to get synced and rocksdb.blobdb.blob.file.bytes.written includes blob bytes failed to get written.
- Fixed a possible memory leak or crash on a failure (such as I/O error) in automatic atomic flush of multiple column families.
- Fixed some cases of in-memory data corruption using mmap reads with BackupEngine, sst_dump, or ldb.
- Fixed issues with experimental preclude_last_level_data_seconds option that could interfere with expected data tiering.
- Fixed the handling of the edge case when all existing blob files become unreferenced. Such files are now correctly deleted.
The full release notes as follows: [rocksdbjni releases](https://github.com/facebook/rocksdb/releases).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2389 from SteNicholas/CELEBORN-1330.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Fix typo of `celeborn.network.bind.preferIpAddress` doc from `ture` to `true`.
### Why are the changes needed?
`celeborn.network.bind.preferIpAddress` doc has typo for `ture`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2392 from SteNicholas/prefer-ip-address.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
To fix a bug that might cause persisted committed file info lost.
### Why are the changes needed?
A worker starts will clean its persisted committed file info and won't put back if this worker restart again, the committed file infos will lost.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA.
Closes#2390 from FMX/b863-1.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
### What changes were proposed in this pull request?
Updates Celeborn to account for [JDK-8303083](https://bugs.openjdk.org/browse/JDK-8303083), which affects JDK21. (similar to the Apache Spark change here:
https://github.com/apache/spark/pull/39909)
### Why are the changes needed?
Without this change, Spark users of Celeborn will encounter a runtime error similar to the following:
`Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long, int) [in thread "Executor task launch worker for task 0.0 in stage 0.0 (TID 0)"]
at org.apache.celeborn.common.unsafe.Platform.<clinit>(Platform.java:135)
... 16 more`
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested using standalone Spark 3.5.1 (Hadoop 3.2) and the Celeborn `main` branch with JDK21 build changes in https://github.com/apache/incubator-celeborn/pull/2385. Reproduced the runtime error above and confirmed the patch resolves it.
Closes#2387 from curtishoward/CELEBORN-1327.
Authored-by: Curtis Howard <curtis@curtiss-mbp.lan>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
while SortBasedWriter has less memory footprint than HashBasedWriter, it suffers from performance issue when we have many partitions and the write buffer is filled with small chunks of data quickly
for example, if sort buffer size is 32K, you have 4 partitions and 128K data in total, the data distribution is like partition A, B, C, D, each time it comes with 8K per partition.... in this case, you need to compress and send small 8K chunk 4 times per partition , the cost would become very high. If you use hashbasedwriter, it doesn't have this problem since the push only happens when the per-partition buffer is full. Of course , larger sort buffer size can mitigate the issue, but tuning sort buffer size per job is a tedious work
this PR introduces a new feature that we measure total size of pushed bytes and pushed count as well as the "should-pushed" bytes and counts (should-push means that , the data we pushed is larger than CLIENT_PUSH_BUFFER_MAX_SIZE (in another word, we will trigger a push even with hashbasedwriter in this case))
when actualPushedBytes/actualPushedCounts > (1 + Threshold) * (ShouldPushBytes/ShouldPushCounts), we will enlarge the sort buffer size by 1X to try to buffer more data before pushing (the max size of sortBuffer would be capped at # of partitions * CLIENT_PUSH_BUFFER_MAX_SIZE)
### Why are the changes needed?
to reduce perf cost in sortbased writer
### Does this PR introduce _any_ user-facing change?
no, but have 2 extra configurations
### How was this patch tested?
in prod of our company and also unit test
Closes#2358 from CodingCat/adaptive_memory_threshold.
Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Enable custom network location aware replication, based on a custom impl of `DNSToSwitchMapping`.
### Why are the changes needed?
Resolution of network location of multiple workers at master can be expensive at times. This way, each worker resolves its own network location and sends to master via the RegisterWorker transport message. If worker cannot resolve, fallback to attempting to resolve at master (during update meta or reload of snapshot). Proposal: [Celeborn Custom Network Location Aware Replication](https://docs.google.com/document/d/11M_MKKnIXCTExJHMX-OMTq7SBpkl8fJMlpy8hLgmev0/edit#heading=h.s3vnydz589z5)
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Updated the unit tests.
Closes#2367 from akpatnam25/CELEBORN-1313.
Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
`FakedRemoteInputChannel` use task name of `RemoteShuffleInputGateDelegation` as `owningTaskName` to get the task name from log of `SingleInputGate`.
### Why are the changes needed?
`FakedRemoteInputChannel` use empty string as `owningTaskName`, which could not get the task name from the log of `SingleInputGate`, which could support using task name of `RemoteShuffleInputGateDelegation` as `owningTaskName`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2384 from SteNicholas/CELEBORN-1326.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Improve `SuiteJ` of `client-flink` module including:
- Align the name of test class with `SuiteJ`.
- Improve the test case of `SuiteJ`.
### Why are the changes needed?
`SuiteJ` of `client-flink` module could be improved.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI.
Closes#2383 from SteNicholas/client-flink-suitej.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce `ShutdownWorkerCount` metric to record the count of workers in shutdown list.
<img width="1432" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/bc84b281-30ca-40a1-92e4-fb9cf10b5aeb">
### Why are the changes needed?
`/shutdownWorkers` lists all shutdown workers of the master at present. Therefore it's recommended to introduce ShutdownWorkerCount metric to record the count of workers in shutdown list.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- [Celeborn Dashboard](https://stenicholas.grafana.net/public-dashboards/c44822917403401690edb15617ec9f08)
Closes#2379 from SteNicholas/CELEBORN-1323.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce application module for dashboard frontend:

### Why are the changes needed?
Dashboard frontend should support application page to display the application list and overview details of Celeborn.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
Local test.
Closes#2368 from miyuesc/CELEBORN-1305.
Lead-authored-by: MiyueSC <913784771@qq.com>
Co-authored-by: MiyueFE <913784771@qq.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
PrometheusSink is not used.
### Why are the changes needed?
Close CELEBORN-1324
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Not needed.
Closes#2381 from turboFei/remove_unused.
Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request?
Currently SOFT_SPLIT bypasses `ReviveManager` and sends `PartitionSplit` requests to
LifecycleManager individually, which can cause too many messages in `inbox`, see the issued
described in https://github.com/apache/incubator-celeborn/pull/2366
This PR uses `ReviveManager`, i.e. batch RPCs for `SOFT_SPLIT` events. Before this PR,
the max size of `Inbox#messages` is several hundreds in my experiment where frequent soft splits happen:
```
24/03/11 18:33:05 WARN [rpc-server-4-7] Inbox: last max msg cnt in 1 second: 620
24/03/11 18:33:06 WARN [rpc-server-4-5] Inbox: last max msg cnt in 1 second: 105
24/03/11 18:33:07 WARN [rpc-server-4-14] Inbox: last max msg cnt in 1 second: 94
24/03/11 18:33:08 WARN [rpc-server-4-13] Inbox: last max msg cnt in 1 second: 726
24/03/11 18:33:09 WARN [rpc-server-4-3] Inbox: last max msg cnt in 1 second: 50]
24/03/11 18:33:10 WARN [rpc-server-4-16] Inbox: last max msg cnt in 1 second: 98
24/03/11 18:33:11 WARN [rpc-server-4-12] Inbox: last max msg cnt in 1 second: 83
24/03/11 18:33:12 WARN [rpc-server-4-11] Inbox: last max msg cnt in 1 second: 138
24/03/11 18:33:13 WARN [rpc-server-4-9] Inbox: last max msg cnt in 1 second: 315
24/03/11 18:33:14 WARN [rpc-server-4-4] Inbox: last max msg cnt in 1 second: 787
```
After this PR, the size is reduced by one magnitude:
```
24/03/11 18:39:17 WARN [rpc-server-4-5] Inbox: last max msg cnt in 1 second: 30]
24/03/11 18:39:18 WARN [rpc-server-4-12] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:19 WARN [rpc-server-4-19] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:20 WARN [rpc-server-4-15] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:21 WARN [rpc-server-4-3] Inbox: last max msg cnt in 1 second: 10]
24/03/11 18:39:22 WARN [rpc-server-4-20] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:23 WARN [rpc-server-4-12] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:24 WARN [rpc-server-4-24] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:25 WARN [rpc-server-4-9] Inbox: last max msg cnt in 1 second: 10]
24/03/11 18:39:26 WARN [rpc-server-4-13] Inbox: last max msg cnt in 1 second: 1]
24/03/11 18:39:27 WARN [rpc-server-4-2] Inbox: last max msg cnt in 1 second: 10]
24/03/11 18:39:28 WARN [rpc-server-4-2] Inbox: last max msg cnt in 1 second: 80]
```
### Why are the changes needed?
ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
GA and manual test.
Closes#2377 from waitinfuture/1320.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Rename `LostWorkers` metric to `LostWorkerCount` to align the naming style of other worker count metrics.
### Why are the changes needed?
The naming of `LostWorkers` metric is different from other metric of `MasterSource` like `WorkerCount`, `ExcludedWorkerCount` etc, which could be renamed to `LostWorkerCount` to align the naming style.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2378 from SteNicholas/CELEBORN-1322.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Change noisy expire shuffle log to debug level and aggregate log
### Why are the changes needed?
Remove noisy expire log
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closes#2376 from AngersZhuuuu/CELEBORN-1321.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
This enables a Celeborn Worker to retrieve the application meta from the Master if it hasn't received the secret from the Master before the application attempts to connect to it. Additionally, the Celeborn Worker's SecretRegistry has been converted into an LRU cache to prevent unbounded growth of the registry.
### Why are the changes needed?
This is last change needed for Auth support in Celeborn (https://issues.apache.org/jira/browse/CELEBORN-1011)
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added UTs and part of a bigger change which will be tested end-to-end.
Closes#2363 from otterc/CELEBORN-1179.
Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix style and Gluten link in Developers Doc.
### Why are the changes needed?
- `slotsallocation.md` has the following wrong style:
<img width="1434" alt="image" src="https://github.com/apache/incubator-celeborn/assets/10048174/97fb53ed-473d-4f3d-8231-1fb613df9132">
- Gluten is apache incubating projetc, of which the link of Gluten project should be [Gluten](https://github.com/apache/incubator-gluten).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#2375 from SteNicholas/developers-doc.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>