[CELEBORN-1341][FOLLOWUP] Improve Celeborn document

### What changes were proposed in this pull request?

Improve Celeborn document to fix typos, formats, unvalid link and unsynced default value of document. Meanwhile, the public interfaces of `shuffleclient.md` keep the consistent with `ShuffleClient`.

### Why are the changes needed?

There are some typos, formats, unvalid link and unsynced default value fixes in Celeborn document at present. Meanwhile, the public interfaces of `shuffleclient.md` is inconsistent with `ShuffleClient`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2410 from SteNicholas/CELEBORN-1341.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
This commit is contained in:
SteNicholas 2024-03-22 16:34:25 +08:00 committed by mingji
parent d62f75fdc7
commit 8fbcbead48
No known key found for this signature in database
GPG Key ID: 6392F71F37356FA0
8 changed files with 20 additions and 13 deletions

View File

@ -46,7 +46,7 @@ For more information of Celeborn configurations, see [CONFIGURATIONS](../CONFIGU
#### Install Celeborn
```
helm install celeborn ${CELEBORN_HOME}/charts/celebron -n ${celeborn namespace}
helm install celeborn ${CELEBORN_HOME}/charts/celeborn -n ${celeborn namespace}
```
#### Connect to Celeborn in K8s pod

View File

@ -20,7 +20,7 @@ license: |
---
Quick Start
===
This documentation gives a quick start guide for running Apache Spark/Flink/MapReduce with Apache Celeborn™(Incubating).
This documentation gives a quick start guide for running Spark/Flink/MapReduce with Apache Celeborn™(Incubating).
### Download Celeborn
Download the latest Celeborn binary from the [Downloading Page](https://celeborn.apache.org/download/).
@ -126,11 +126,13 @@ cp $CELEBORN_HOME/flink/<Celeborn Client Jar> $FLINK_HOME/lib/
```
#### Add Celeborn configuration to Flink's conf
Set `shuffle-service-factory.class` to Celeborn's ShuffleServiceFactory in Flink configuration file:
- Flink 1.14.x, Flink 1.15.x, Flink 1.17.x, Flink 1.18.x
```shell
cd $FLINK_HOME
vi conf/flink-conf.yaml
```
- Flink 1.19.x
```shell
cd $FLINK_HOME

View File

@ -189,7 +189,7 @@ Example can refer to [Hadoop Rack Awareness](https://hadoop.apache.org/docs/stab
`ShuffleClient` records the shuffle partition location's host, service port, and filename,
to support workers recovering reading existing shuffle data after worker restart,
during worker shutdown, workers should store the meta about reading shuffle partition files in LevelDB,
during worker shutdown, workers should store the meta about reading shuffle partition files in RocksDB or LevelDB(deprecated),
and restore the meta after restarting workers, also workers should keep a stable service port to support
`ShuffleClient` retry reading data. Users should set `celeborn.worker.graceful.shutdown.enabled` to `true` and
set below service port with stable port to support worker recover status.

View File

@ -33,7 +33,7 @@ Celeborn currently supports rapid deployment by using helm.
### 1. Get Celeborn Binary Package
You can find released version of Celeborn on https://celeborn.apache.org/download/.
You can find released version of Celeborn on [Downloading Page](https://celeborn.apache.org/download/).
Of course, you can build binary package from master branch or your own branch by using `./build/make-distribution.sh` in
source code.
@ -139,7 +139,7 @@ network infrastructure, this may cause pressure on DNS service or other network
### 6. Build Celeborn Client
Here, without going into detail on how to configure spark/flink to find celeborn master/worker, mention the key
Here, without going into detail on how to configure Spark/Flink/MapReduce to find celeborn master/worker, mention the key
configuration:
```
@ -149,5 +149,5 @@ spark.celeborn.master.endpoints: celeborn-master-0.celeborn-master-svc.<namespac
You can find why config endpoints such way
in [Kubernetes DNS for Service And Pods](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/)
> Notice: You should ensure that Spark/Flink can find the Celeborn Master/Worker via IP or the Kubernetes DNS mentioned
> Notice: You should ensure that Spark/Flink/MapReduce can find the Celeborn Master/Worker via IP or the Kubernetes DNS mentioned
> above

View File

@ -20,8 +20,7 @@ license: |
## Overview
The core components of Celeborn, i.e. `Master`, `Worker`, and `Client` are all engine irrelevant. Developers can
integrate Celeborn with various engines or applications by using or extending Celeborn's `Client`, as the officially
supported plugins for Apache Spark and Apache Flink, see [Spark Plugin](../../developers/spark) and
[Flink Plugin](../../developers/flink).
supported plugins for Spark/Flink/MapReduce.
This article briefly describes an example of integrating Celeborn into a simple distributed application using
Celeborn `Client`.

View File

@ -104,7 +104,7 @@ When graceful shutdown is turned on, upon shutdown, Celeborn will do the followi
2. Worker will inform Clients to split.
3. Client will send `CommitFiles` to the Worker.
Then the Worker waits until all `PartitionLocation` flushes data to persistent storage, stores states in local leveldb/rocksdb,
Then the Worker waits until all `PartitionLocation` flushes data to persistent storage, stores states in local RocksDB or LevelDB(deprecated),
then stops itself. The process is typically within one minute.
For more details, please refer to [Rolling upgrade](../../upgrading/#rolling-upgrade)

View File

@ -124,19 +124,25 @@ to guarantee no data is lost.
```java
public abstract CelebornInputStream readPartition(
int shuffleId,
int appShuffleId,
int partitionId,
int attemptNumber,
int startMapIndex,
int endMapIndex)
int endMapIndex,
ExceptionMaker exceptionMaker,
MetricsCallback metricsCallback)
```
- `shuffleId` is the unique shuffle id of the application
- `shuffleId` is the unique shuffle id of Celeborn
- `appShuffleId` is the unique shuffle id of the application
- `partitionId` is the partition id to read from
- `attemptNumber` is the attempt id of reduce task, can be safely set to any value
- `startMapIndex` is the index of start map index of interested map range, set to 0 if you want to read all
partition data
- `endMapIndex` is the index of end map index of interested map range, set to `Integer.MAX_VALUE` if you want
to read all partition data
- `exceptionMaker` is the marker of exception including fetch failure exception.
- `metricsCallback` is the callback of monitoring metrics to increase read bytes and time etc.
The returned input stream is guaranteed to be `Exactly Once`, meaning no data lost and no duplicated reading, or else
an exception will be thrown, see [Here](../../developers/faulttolerant#exactly-once).

View File

@ -50,7 +50,7 @@ Users can increase the configuration value appropriately according to the situat
Shuffle client records the shuffle partition location's host, service port, and filename,
to support workers recovering reading existing shuffle data after worker restart,
during worker shutdown, workers should store the meta about reading shuffle partition files
in LevelDB, and restore the meta after restarting workers.
in RocksDB or LevelDB(deprecated), and restore the meta after restarting workers.
Users should set `celeborn.worker.graceful.shutdown.enabled` to `true` to enable graceful shutdown.
During this process, worker will wait all allocated partition's in this worker to be committed
within a timeout of `celeborn.worker.graceful.shutdown.checkSlotsFinished.timeout`, which default value is `480s`.
@ -70,7 +70,7 @@ In order to speed up the restart process, worker let all push data requests retu
during worker shutdown, and shuffle client will re-apply for a new partition location for these allocated partitions.
Then client side can record all HARD_SPLIT partition information and pre-commit these partition,
then the worker side allocated partitions can be committed in a very short time. User should enable
`celeborn.client.shuffle.batchHandleCommitPartition.enabled`, the default value is false.
`celeborn.client.shuffle.batchHandleCommitPartition.enabled`, the default value is true.
### Example setting