celeborn/docs/configuration/columnar-shuffle.md
Fu Chen aa3bb0ac3b
[CELEBORN-679] Optimize Utils#bytesToString
### What changes were proposed in this pull request?

refer to https://github.com/apache/spark/pull/40301

1. Optimize `Utils.bytesToString`. Arithmetic ops on BigInt and BigDecimal are order(s) of magnitude slower than the ops on primitive types. Division is an especially slow operation and it is used en masse here.

2. According to the information sourced from [Wikipedia](https://en.wikipedia.org/wiki/Kilobyte), it is established that 1000 is the appropriate factor for representing kilobytes (KB), while 1024 is the correct factor for kibibytes (KiB). In alignment with this understanding, changing the size unit from "KB" to "KiB".

### Why are the changes needed?

the Utils#bytesToString method is frequently employed in memory-related log messages.

### Does this PR introduce _any_ user-facing change?

No, only perf improvement.

### How was this patch tested?

existing UT and manually tested.

Closes #1590 from cfmcgrady/bytesToString.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-14 17:42:16 +08:00

1.7 KiB

license
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Key Default Description Since
celeborn.columnarShuffle.batch.size 10000 Vector batch size for columnar shuffle. 0.3.0
celeborn.columnarShuffle.codegen.enabled false Whether to use codegen for columnar-based shuffle. 0.3.0
celeborn.columnarShuffle.enabled false Whether to enable columnar-based shuffle. 0.2.0
celeborn.columnarShuffle.encoding.dictionary.enabled false Whether to use dictionary encoding for columnar-based shuffle data. 0.3.0
celeborn.columnarShuffle.encoding.dictionary.maxFactor 0.3 Max factor for dictionary size. The max dictionary size is min(32.0 KiB, celeborn.columnarShuffle.batch.size * celeborn.columnar.shuffle.encoding.dictionary.maxFactor). 0.3.0
celeborn.columnarShuffle.offHeap.enabled false Whether to use off heap columnar vector. 0.3.0