celeborn/network.md at 1f95ccb55a1f99bb7caeda00c31c400185df8752

[CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout (#1047 )

* [CELEBORN-102][REFACTOR] TIMEOUT default value should be changed with network timeout

2022-12-06 14:41:23 +08:00

4.3 KiB

Raw Blame History

license
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

license

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Key	Default	Description	Since
celeborn.<module>.decoder.mode	default	Netty TransportFrameDecoder implementation, available options: default, supplier.
celeborn.<module>.io.backLog	0	Requested maximum length of the queue of incoming connections. Default 0 for no backlog.
celeborn.<module>.io.clientThreads	0	Number of threads used in the client thread pool. Default to 0, which is 2x#cores.
celeborn.<module>.io.connectTimeout	<value of celeborn.network.connect.timeout>	Socket connect timeout.
celeborn.<module>.io.connectionTimeout	<value of celeborn.network.timeout>	Connection active timeout.
celeborn.<module>.io.enableVerboseMetrics	false	Whether to track Netty memory detailed metrics. If true, the detailed metrics of Netty PoolByteBufAllocator will be gotten, otherwise only general memory usage will be tracked.
celeborn.<module>.io.lazyFD	true	Whether to initialize FileDescriptor lazily or not. If true, file descriptors are created only when data is going to be transferred. This can reduce the number of open files.
celeborn.<module>.io.maxRetries	3	Max number of times we will try IO exceptions (such as connection timeouts) per request. If set to 0, we will not do any retries.
celeborn.<module>.io.mode	NIO	Netty EventLoopGroup backend, available options: NIO, EPOLL.
celeborn.<module>.io.numConnectionsPerPeer	2	Number of concurrent connections between two nodes.
celeborn.<module>.io.preferDirectBufs	true	If true, we will prefer allocating off-heap byte buffers within Netty.
celeborn.<module>.io.receiveBuffer	0b	Receive buffer size (SO_RCVBUF). Note: the optimal size for receive buffer and send buffer should be latency * network_bandwidth. Assuming latency = 1ms, network_bandwidth = 10Gbps buffer size should be ~ 1.25MB.	0.2.0
celeborn.<module>.io.retryWait	5s	Time that we will wait in order to perform a retry after an IOException. Only relevant if maxIORetries > 0.	0.2.0
celeborn.<module>.io.sendBuffer	0b	Send buffer size (SO_SNDBUF).	0.2.0
celeborn.<module>.io.serverThreads	0	Number of threads used in the server thread pool. Default to 0, which is 2x#cores.
celeborn.network.connect.timeout	10s	Default socket connect timeout.	0.2.0
celeborn.network.timeout	240s	Default timeout for network operations.	0.2.0
celeborn.port.maxRetries	1	When port is occupied, we will retry for max retry times.	0.2.0
celeborn.rpc.askTimeout	<value of celeborn.network.timeout>	Timeout for RPC ask operations.	0.2.0
celeborn.rpc.connect.threads	64		0.2.0
celeborn.rpc.haClient.askTimeout	<value of celeborn.network.timeout>	Timeout for HA client RPC ask operations.	0.2.0
celeborn.rpc.lookupTimeout	30s	Timeout for RPC lookup operations.	0.2.0
celeborn.shuffle.maxChunksBeingTransferred	9223372036854775807	The max number of chunks allowed to be transferred at the same time on shuffle service. Note that new incoming connections will be closed when the max number is hit. The client will retry according to the shuffle retry configs (see `celeborn.shuffle.io.maxRetries` and `celeborn.shuffle.io.retryWait`), if those limits are reached the task will fail with fetch failure.	0.2.0

4.3 KiB Raw Blame History

4.3 KiB

Raw Blame History