### What changes were proposed in this pull request?
Update the script to start word count demo.
### Why are the changes needed?
A user reported that he could not run the demo while following the quick start docs.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Cluster.
Closes#1880 from FMX/CELEBORN-948.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
This PR fixes performance degradation when Spark's coalescePartitions takes effect caused
by RPC latency.
### Why are the changes needed?
I encountered a performance degradation when testing tpcds 10T q10:
||Time|
|---|---|
|ESS|14s|
|Celeborn| 24s|
After digging into it I found out that q10 triggers partition coalescence:

As I configured `spark.sql.adaptive.coalescePartitions.initialPartitionNum` to 1000, `CelebornShuffleReader`
will call `shuffleClient.readPartition` sequentially 1000 times, causing the delay.
This PR optimizes by calling `shuffleClient.readPartition` in parallel. After this PR q10 time becomes 14s.
### Does this PR introduce _any_ user-facing change?
No, but introduced a new client side configuration `celeborn.client.streamCreatorPool.threads`
which defaults to 32.
### How was this patch tested?
TPCDS 1T and passes GA.
Closes#1876 from waitinfuture/943.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
In MapPartiitoin, datas are split into regions.
1. Unlike ReducePartition whose partition split can occur on pushing data
to keep MapPartition data ordering, PartitionSplit only be done on the time of sending PushDataHandShake or RegionStart messages (As shown in the following image). That's to say that the partition split only appear at the beginnig of a region but not inner a region.
> Notice: if the client side think that it's failed to push HandShake or RegionStart messages. but the worker side can still receive normal HandShake/RegionStart message. After client revive succss, it don't push any messages to old partition, so the worker having the old partition will create a empty file. After committing files, the worker will return empty commitids. That's to say that empty file will be filterd after committing files and ReduceTask will not read any empty files.

2. PushData/RegioinFinish don't care the following cases:
- Diskfull
- ExceedPartitionSplitThreshold
- Worker ShuttingDown
so if one of the above three conditions appears, PushData and RegionFinish cant still do as normal. Workers should consider the ShuttingDown case and try best to wait all the regions finished before shutting down.
if PushData or RegionFinish failed like network timeout and so on, then MapTask will failed and start another attempte maptask.

3. how shuffle read supports partition split?
ReduceTask should get split paritions by order and open the stream by partition epoc orderly
### Why are the changes needed?
PartiitonSplit is not supported by MapPartition from now.
There still a risk that a partition file'size is too large to store the file on worker disk.
To avoid this risk, this pr introduces partition split in shuffle read and shuffle write.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
UT and manual TPCDS test
Closes#1550 from FMX/CELEBORN-627.
Lead-authored-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
Co-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <ethanfeng@apache.org>
Signed-off-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix the incorrect deploy doc about using HDFS only.
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
Just docs.
Closes#1874 from FMX/CELEBORN-941.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Introduce a quick start guide for running Apache Flink with Apache Celeborn to help Flink users to run with Celeborn.
### Why are the changes needed?
There is no quick start guide for running Apache Flink with Apache Celeborn.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
None.
Closes#1868 from SteNicholas/CELEBORN-822.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
For spark clusters, support read local shuffle file if Celeborn is co-deployed with yarn node managers. This PR help to reduce the number of active connections.
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster. The performance is identical whether you enable local reader, but the active connection number may vary according to your connections per peer.
<img width="951" alt="截屏2023-08-16 20 20 14" src="https://github.com/apache/incubator-celeborn/assets/4150993/9106e731-28fc-4e78-9c05-ae6a269d249a">
The active connection number changed from 3745 to 2894. This PR will help to improve cluster stability.
Closes#1812 from FMX/CELEBORN-752.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Correct `celeborn.metrics.conf.*.sink.csv.class` configuration example for a CSV sink.
### Why are the changes needed?
`celeborn.metrics.conf.*.sink.csv.class` configuration example for a CSV sink is wrong, which value should be `org.apache.celeborn.common.metrics.sink.CsvSink`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
None.
Closes#1865 from SteNicholas/CELEBORN-927.
Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix a broken link in docs/developers/overview.md.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Locally tested.
Closes#1845 from zhouyifan279/upgrade-page-link.
Authored-by: zhouyifan279 <zhouyifan279@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Adding a flag indicating high load in the worker's heartbeat allows the master to better schedule the workers
### Why are the changes needed?
In our production environment, there is a node with abnormally high load, but the master is not aware of this situation. It assigned numerous jobs to this node, and as a result, the stability of these jobs has been affected.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
UT
Closes#1840 from JQ-Cao/920.
Lead-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: caojiaqing <caojiaqing@bilibili.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
Make `celeborn.shuffle.chunk.size` worker side only config.
Add a new client side config `celeborn.client.fetch.dfsReadChunkSize` for DfsPartitionReader
### Does this PR introduce _any_ user-facing change?
Yes, the chunks size of DfsPartitionReader is changed from client side config `celeborn.shuffle.chunk.size`
to `celeborn.client.fetch.dfsReadChunkSize`
### How was this patch tested?
Passes GA
Closes#1834 from lishiyucn/main.
Lead-authored-by: lishiyucn <675590586@qq.com>
Co-authored-by: shiyu li <675590586@qq.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
After #1829 we set `celeborn.worker.directMemoryRatioToResume` default value from `0.5` to `0.7`.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
No
Closes#1836 from zwangsheng/CELEBORN-909.
Lead-authored-by: zwangsheng <2213335496@qq.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Add a new `backpressure.svg` to replace the out-date one.
### Why are the changes needed?
After #1811, we refactor celeborn worker back-pressure logic, we should add new flowchart for user to understand.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?

Closes#1829 from zwangsheng/CELEBORN-905.
Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1768 from AngersZhuuuu/CELEBORN-847.
Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Co-authored-by: Keyong Zhou <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix typo in CelebornConf
### Why are the changes needed?
Fix typo in CelebornConf
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
Passing GA
Closes#1813 from jiaoqingbo/typo-conf.
Authored-by: e <1178404354@qq.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1806 from cfmcgrady/sbt-docs-followup.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual test
Closes#1795 from cfmcgrady/sbt-docs.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
<!--
Thanks for sending a pull request! Here are some tips for you:
- Make sure the PR title start w/ a JIRA ticket, e.g. '[CELEBORN-XXXX] Your PR title ...'.
- Be sure to keep the PR description updated to reflect all changes.
- Please write your PR title to summarize what this PR proposes.
- If possible, provide a concise example to reproduce the issue for a faster review.
-->
### What changes were proposed in this pull request?
1. Expose the config check logic during `MemoryManager#initialization` in the user configuration doc.
2. Add Preconditions Error Message
3. Add unit test to make sure that part of the logic isn't altered by mistake
### Why are the changes needed?
User-friendly
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Add Unit Test
Closes#1801 from zwangsheng/CELEBORN-883.
Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: zwangsheng <2213335496@qq.com>
### What changes were proposed in this pull request?
Add config to limit max workers when offering slots, the config can be set both
in server side and client side. Celeborn will choose the smaller positive configs from client and master.
### Why are the changes needed?
For large Celeborn clusters, users may want to limit the number of workers that
a shuffle can spread, reasons are:
1. One worker failure will not affect all applications
2. One huge shuffle will not affect all applications
3. It's more efficient to limit a shuffle within a restricted number of workers, say 100, than
spreading across a large number of workers, say 1000, because the network connections
in pushing data is `number of ShuffleClient` * `number of allocated Workers`
The recommended number of Workers should depend on workload and Worker hardware,
and this can be configured per application, so it's relatively flexible.
### Does this PR introduce _any_ user-facing change?
No, added a new configuration.
### How was this patch tested?
Added ITs and passes GA.
Closes#1790 from waitinfuture/152.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Support worker recovery if the worker has crashed when workers has enabled graceful shutdown..
1. Persist committed file info to LevelDB.
2. Load levelDB when worker started.
3. Clean expired file infos in LevelDB.
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster. After testing on a cluster I found that 8k file infos will consume about 2MB of disk space, disk space can be reclaimed if shuffle is expired shortly.
Closes#1779 from FMX/CELEBORN-863.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1784 from kerwin-zk/gluten_celeborn.
Lead-authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Co-authored-by: Kerwin Zhang <xiyu.zk@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1788 from waitinfuture/869-fu.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1787 from waitinfuture/869.
Lead-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Co-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1782 from waitinfuture/864.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1778 from waitinfuture/860-1.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1775 from waitinfuture/853.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1769 from waitinfuture/834.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1772 from waitinfuture/849.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1770 from AngersZhuuuu/CELEBORN-851.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1731 from AngersZhuuuu/CELEBORN-808.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1759 from AngersZhuuuu/CELEBORN-832.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
make it more clarity and readability
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass CI
Closes#1763 from cfmcgrady/celeborn-822-followup.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1754 from waitinfuture/831.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1752 from waitinfuture/826.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
<img width="1610" alt="截屏2023-07-24 上午11 34 43" src="https://github.com/apache/incubator-celeborn/assets/46485123/ba1b040b-9ea4-4c93-b055-75a469365ff2">
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1751 from AngersZhuuuu/CELEBORN-828.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Eliminate `chunksBeingTransferred` calculation when `celeborn.shuffle.io.maxChunksBeingTransferred` is not configured
### Why are the changes needed?
I observed high CPU usage on `ChunkStreamManager#chunksBeingTransferred` calculation. We can eliminate the method call if no threshold is configured, and investigate how to improve the method itself in the future.
<img width="1947" alt="image" src="https://github.com/apache/incubator-celeborn/assets/26535726/412c6a41-c0ce-440c-ae99-4424cb8702d3">
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI and Review.
Closes#1749 from pan3793/CELEBORN-827.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1747 from waitinfuture/824.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1746 from waitinfuture/823.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.

### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No.
Closes#1745 from waitinfuture/822.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
<img width="1643" alt="截屏2023-07-20 下午12 01 06" src="https://github.com/apache/incubator-celeborn/assets/46485123/d8822003-602f-4fe8-9634-ff25c0367cb1">
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1738 from AngersZhuuuu/CELEBORN-814.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Cleans up the pooled send buffers and push tasks if the SendBufferPool has been idle for more than
`celeborn.client.push.sendbufferpool.expireTimeout`.
### Why are the changes needed?
Before this PR the SendBufferPool will cache the send buffers and push tasks forever. If they are large
and will not be reused in the future, it wastes memory and causes GC.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA and manual tests.
Closes#1735 from waitinfuture/812-1.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix some typos and grammar
### Why are the changes needed?
Ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
manually test
Closes#1733 from onebox-li/fix-typo.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
Timeout of ```RpcEndpointRef.ask``` is controlled by ```celeborn.rpc.askTimeout```,
so we also need to increase ```celeborn.rpc.askTimeout``` to extend the timeout of commit files.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA and manual test.
Closes#1725 from waitinfuture/803-fu.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
In 0.2.1-incubating, commit files default timeout is ```NETWORK_TIMEOUT```, which is 240s.
It's more reasonable because commit files costs relatively long time. In my testing with tough disks,
30s timeout with 2 retires is not enough.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA and manual test.
Closes#1724 from waitinfuture/803.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
…up client
### What changes were proposed in this pull request?
Add heartbeat from client to lifecycle manager. In this PR heartbeat request contains local shuffle ids from
client, lifecycle manager checks with it's local set and returns ids it doesn't know. Upon receiving response,
client calls ```unregisterShuffle``` for cleanup.
### Why are the changes needed?
Before this PR, client side ```unregisterShuffle``` is never called. When running TPCDS 3T with spark thriftserver
without DRA, I found the Executor's heap contains 1.6 million PartitionLocation objects (and StorageInfo):

After this PR, the number of PartitionLocation objects decreases to 275 thousands

This heartbeat can be extended in the future for other purposes, i.e. reporting client's metrics.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA and manual test.
Closes#1719 from waitinfuture/798.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
…Flight.total```
### What changes were proposed in this pull request?
Refer to https://github.com/apache/incubator-celeborn/pull/1720#discussion_r1265092164
### Why are the changes needed?
ditto
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA.
Closes#1723 from waitinfuture/799-fu.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
In case where worker instances is very large, say 1000, then before this PR total memory consumed
by inflight requests is 64K * 1000 * ```celeborn.client.push.maxReqsInFlight(16)``` = 1G. This PR
limits total inflight push requests, as 0.2.1-incubating does.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA and manual test.
Closes#1720 from waitinfuture/799.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Decrease writeTime metric sampling frequency to improve perf
2. Set default value of ```celeborn.<module>.push.timeoutCheck.threads``` and ```celeborn.<module>.fetch.timeoutCheck.threads``` to 4
### Why are the changes needed?
Following are test cases
case 1: ```spark.sparkContext.parallelize(1 to 8000, 8000).flatMap( _ => (1 to 15000000).iterator.map(num => num)).repartition(8000).count``` // shuffle 1.1T data
case 2: ```spark.sparkContext.parallelize(1 to 8000, 8000).flatMap( _ => (1 to 30000000).iterator.map(num => num)).repartition(8000).count``` // shuffle 2.2T data
Following are e2e time of shuffle write stage
||Sort pusher before|Sort pusher after|Hash pusher before|Hash pusher after|
|----|----|----|----|-----|
|case1|4.4min|4.1min|4.4min|3.9min|
|case2|9.1min|8.4min|9.7min|8.5min|
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passes GA and manual test.
Closes#1718 from waitinfuture/797.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>