### What changes were proposed in this pull request?
Remove environment variables `CELEBORN_MASTER_HOST` and `CELEBORN_MASTER_PORT`, and makes `CELEBORN_LOCAL_HOSTNAME` takes effect on both master and worker.
### Why are the changes needed?
There are many different ways to configure the master/worker host and port, which makes the thing complex and inconsistent.
After this change,
#### master
1. cli args `--host` `--port` takes the highest priority
2. then lookup env `CELEBORN_LOCAL_HOSTNAME`
3. things are different when HA enabled and disabled
3.1. when HA is disabled, lookup configurations `celeborn.master.host` and `celeborn.master.port`
3.2. when HA is enabled, each node needs to know the whole cluster info,
```
celeborn.master.ha.node.1.host clb-1
celeborn.master.ha.node.1.port 9097
celeborn.master.ha.node.2.host clb-2
celeborn.master.ha.node.2.port 9097
celeborn.master.ha.node.3.host clb-3
celeborn.master.ha.node.3.port 9097
```
in addition, `celeborn.master.ha.node.id=1` can be used to indicate the node id, otherwise, the master will try to bind each host to match the node id.
#### worker
1. cli args `--host` `--port` takes the highest priority
2. then lookup env `CELEBORN_LOCAL_HOSTNAME`
things are simple than the master case because each worker is not required to know others.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
UT.
Closes #1616 from pan3793/CELEBORN-707.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
55 lines
3.3 KiB
Markdown
55 lines
3.3 KiB
Markdown
---
|
|
hide:
|
|
- navigation
|
|
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
# Migration Guide
|
|
|
|
## Upgrading from 0.2.1 to 0.3.0
|
|
|
|
- From 0.3.0 on the default value for `celeborn.client.push.replicate.enabled` is changed from `true` to `false`, users
|
|
who want replication on should explicitly enable replication. For example, to enable replication for Spark
|
|
users should add the spark config when submitting job: `spark.celeborn.client.push.replicate.enabled=true`
|
|
|
|
- From 0.3.0 on the default value for `celeborn.worker.storage.workingDir` is changed from `hadoop/rss-worker/shuffle_data` to `rss-worker/shuffle_data`,
|
|
users who want to use origin working dir path should set this configuration.
|
|
|
|
- Since 0.3.0, configuration namespace `celeborn.ha.master` is deprecated, and will be removed in the future versions.
|
|
All configurations `celeborn.ha.master.*` should migrate to `celeborn.master.ha.*`.
|
|
|
|
- Since 0.3.0, environment variables `CELEBORN_MASTER_HOST` and `CELEBORN_MASTER_PORT` are removed.
|
|
Instead `CELEBORN_LOCAL_HOSTNAME` works on both master and worker, which takes high priority than configurations defined in properties file.
|
|
|
|
- When using 0.2.1 as client side and 0.3.0 as server side, you may see the following Exception in LifecycleManger's
|
|
log. You can safely ignore the log, it's caused by the behavior change when Master receives heartbeat from Application.
|
|
|
|
??? warning "logs"
|
|
```
|
|
23/06/20 18:12:30 WARN TransportChannelHandler: Exception in connection from /192.168.1.16:9097
|
|
java.io.InvalidObjectException: enum constant HEARTBEAT_FROM_APPLICATION_RESPONSE does not exist in class org.apache.celeborn.common.protocol.MessageType
|
|
at java.io.ObjectInputStream.readEnum(ObjectInputStream.java:2157)
|
|
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1662)
|
|
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
|
|
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
|
|
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
|
|
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
|
|
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
|
|
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
|
|
at org.apache.celeborn.common.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
|
|
at org.apache.celeborn.common.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:110)
|
|
```
|