celeborn/docs/migration.md
Cheng Pan ac84d64d51 [CELEBORN-707][MASTER] Remove env CELEBORN_MASTER_HOST and CELEBORN_MASTER_PORT
### What changes were proposed in this pull request?

Remove environment variables `CELEBORN_MASTER_HOST` and `CELEBORN_MASTER_PORT`, and makes `CELEBORN_LOCAL_HOSTNAME` takes effect on both master and worker.

### Why are the changes needed?

There are many different ways to configure the master/worker host and port, which makes the thing complex and inconsistent.

After this change,

#### master

1. cli args `--host` `--port` takes the highest priority
2. then lookup env `CELEBORN_LOCAL_HOSTNAME`
3. things are different when HA enabled and disabled
  3.1. when HA is disabled, lookup configurations `celeborn.master.host` and `celeborn.master.port`
  3.2. when HA is enabled, each node needs to know the whole cluster info,
     ```
     celeborn.master.ha.node.1.host clb-1
     celeborn.master.ha.node.1.port 9097
     celeborn.master.ha.node.2.host clb-2
     celeborn.master.ha.node.2.port 9097
     celeborn.master.ha.node.3.host clb-3
     celeborn.master.ha.node.3.port 9097
     ```
     in addition, `celeborn.master.ha.node.id=1` can be used to indicate the node id, otherwise, the master will try to bind each host to match the node id.

#### worker

1. cli args `--host` `--port` takes the highest priority
2. then lookup env `CELEBORN_LOCAL_HOSTNAME`

things are simple than the master case because each worker is not required to know others.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

UT.

Closes #1616 from pan3793/CELEBORN-707.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-06-25 16:00:59 +08:00

55 lines
3.3 KiB
Markdown

---
hide:
- navigation
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
# Migration Guide
## Upgrading from 0.2.1 to 0.3.0
- From 0.3.0 on the default value for `celeborn.client.push.replicate.enabled` is changed from `true` to `false`, users
who want replication on should explicitly enable replication. For example, to enable replication for Spark
users should add the spark config when submitting job: `spark.celeborn.client.push.replicate.enabled=true`
- From 0.3.0 on the default value for `celeborn.worker.storage.workingDir` is changed from `hadoop/rss-worker/shuffle_data` to `rss-worker/shuffle_data`,
users who want to use origin working dir path should set this configuration.
- Since 0.3.0, configuration namespace `celeborn.ha.master` is deprecated, and will be removed in the future versions.
All configurations `celeborn.ha.master.*` should migrate to `celeborn.master.ha.*`.
- Since 0.3.0, environment variables `CELEBORN_MASTER_HOST` and `CELEBORN_MASTER_PORT` are removed.
Instead `CELEBORN_LOCAL_HOSTNAME` works on both master and worker, which takes high priority than configurations defined in properties file.
- When using 0.2.1 as client side and 0.3.0 as server side, you may see the following Exception in LifecycleManger's
log. You can safely ignore the log, it's caused by the behavior change when Master receives heartbeat from Application.
??? warning "logs"
```
23/06/20 18:12:30 WARN TransportChannelHandler: Exception in connection from /192.168.1.16:9097
java.io.InvalidObjectException: enum constant HEARTBEAT_FROM_APPLICATION_RESPONSE does not exist in class org.apache.celeborn.common.protocol.MessageType
at java.io.ObjectInputStream.readEnum(ObjectInputStream.java:2157)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1662)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at org.apache.celeborn.common.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.celeborn.common.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:110)
```