[CELEBORN-1789][DOC] Document on Java Columnar Shuffle
### What changes were proposed in this pull request? Introduction to Celeborn's Java Columnar Shuffle ### Why are the changes needed? Introduction to Celeborn's Java Columnar Shuffle ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI Closes #3010 from kerwin-zk/CELEBORN-1789. Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com> Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
This commit is contained in:
parent
6028a049df
commit
4b60dae0f0
56
docs/developers/java-columnar-shuffle.md
Normal file
56
docs/developers/java-columnar-shuffle.md
Normal file
@ -0,0 +1,56 @@
|
||||
---
|
||||
license: |
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
https://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
---
|
||||
|
||||
# Introduction to Celeborn's Java Columnar Shuffle
|
||||
|
||||
## Overview
|
||||
|
||||
Celeborn presents a Java Columnar Shuffle designed to enhance performance and efficiency in SparkSQL and DataFrame operations. This innovative approach leverages a columnar format for shuffle operations, achieving a higher compression rate than traditional Row-based Shuffle methods. This improvement leads to significant savings in disk space usage during shuffle operations.
|
||||
|
||||
## Benefits
|
||||
|
||||
- **High Compression Rate**: By organizing data into a columnar format, this feature significantly increases the compression ratio, reducing the disk space required for Shuffle data.
|
||||
However, enabling this optimization incurs overhead for row-to-column and column-to-row transformations. If disk space is a higher priority, it is recommended to enable this feature.
|
||||
If performance is a higher priority, it is advisable to weigh the trade-offs.
|
||||
|
||||
## Configuration
|
||||
|
||||
To leverage Celeborn's Java Columnar Shuffle, you need to apply a patch and configure certain settings in Spark 3.x. Follow the steps below for implementation:
|
||||
|
||||
### Step 1: Apply this patch to obtain the schema information for shuffle
|
||||
|
||||
1. Obtain the `https://github.com/apache/celeborn/tree/main/assets/spark-patch/Celeborn_Columnar_Shuffle_spark3.patch` file that contains the modifications needed for enabling Columnar Shuffle in Spark 3.x.
|
||||
2. Navigate to your Spark source directory.
|
||||
3. Apply the patch.
|
||||
|
||||
### Step 2: Configure Celeborn Settings
|
||||
|
||||
To enable Columnar Shuffle, adjust the following configurations in your Spark application:
|
||||
Open the Spark configuration file or set these parameters in your Spark application.
|
||||
Add the following configuration settings:
|
||||
|
||||
```
|
||||
spark.celeborn.columnarShuffle.enabled true
|
||||
spark.celeborn.columnarShuffle.encoding.dictionary.enabled true
|
||||
```
|
||||
|
||||
If you require further performance optimization, consider enabling code generation with:
|
||||
|
||||
```
|
||||
spark.celeborn.columnarShuffle.codegen.enabled true
|
||||
```
|
||||
@ -95,6 +95,7 @@ nav:
|
||||
- Overview: developers/client.md
|
||||
- LifecycleManager: developers/lifecyclemanager.md
|
||||
- ShuffleClient: developers/shuffleclient.md
|
||||
- JavaColumnarShuffle: developers/java-columnar-shuffle.md
|
||||
- Configuration: developers/configuration.md
|
||||
- Fault Tolerant: developers/faulttolerant.md
|
||||
- Worker Exclusion: developers/workerexclusion.md
|
||||
|
||||
Loading…
Reference in New Issue
Block a user