### What changes were proposed in this pull request?
Add --add-opens to bootstrap shell scripts
### Why are the changes needed?
Additional `--add-opens` is required for Java 17, notes, the `--add-opens` list is copied from Spark and was used for UT, I am not sure each of them is required but at least the UT passed with them.
Details supplied by cfmcgrady
[JEP 403](https://openjdk.java.net/jeps/403) targeted for [JDK 17](https://openjdk.java.net/projects/jdk/17/) will remove `--illegal-access` flag. That will be equivalent to `--illegal-access=deny`.
this means using reflection to invoke protected methods of exported `java.*` APIs will no longer work. For example:
```shell
> /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell
| 欢迎使用 JShell -- 版本 17.0.7
| 要大致了解该版本, 请键入: /help intro
jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]
jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);
| 异常错误 java.lang.reflect.InaccessibleObjectException:Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module 34c45dca
| at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:354)
| at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297)
| at Constructor.checkCanSetAccessible (Constructor.java:188)
| at Constructor.setAccessible (Constructor.java:181)
| at (#2:1)
jshell>
```
```shell
> /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell -R --add-opens=java.base/java.nio=ALL-UNNAMED
| 欢迎使用 JShell -- 版本 17.0.7
| 要大致了解该版本, 请键入: /help intro
jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]
jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);
jshell>
```
### Does this PR introduce _any_ user-facing change?
Yes, for Java 17 support.
### How was this patch tested?
CI and review
Closes#1677 from pan3793/CELEBORN-763.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Always set JVM opts `-XX:+IgnoreUnrecognizedVMOptions`
### Why are the changes needed?
By default, JVM failed to start when unknown opts are set, it's not friendly for users who want to use different versions of JDK.
### Does this PR introduce _any_ user-facing change?
Yes, users can success start celeborn even if they provide unknown JVM opts.
### How was this patch tested?
Review.
Closes#1676 from pan3793/CELEBORN-762.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Rename remain rss related class name and filenames etc...
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1664 from AngersZhuuuu/CELEBORN-751.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
as title
### Why are the changes needed?
mention configuration behavior change in migration guide
### Does this PR introduce _any_ user-facing change?
Yes, the migration guide is updated
### How was this patch tested?
review
Closes#1673 from pan3793/CELEBORN-637-followup.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
eliminate comments introduced in https://github.com/apache/incubator-celeborn/pull/1650
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1672 from cfmcgrady/primary-replica-followup.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
- gauge method definition improvement. i.e.
before
```
def addGauge[T](name: String, f: Unit => T, labels: Map[String, String])
```
after
```
def addGauge[T](name: String, labels: Map[String, String])(f: () => T)
```
which improves the caller-side code style
```
addGauge(name, labels) { () =>
...
}
```
- remove unnecessary Java/Scala collection conversion. Since Scala 2.11 does not support SAM, the extra implicit function is required.
- leverage Logging to defer message evaluation
- UPPER_CASE string constants
### Why are the changes needed?
Improve code quality and performance(maybe)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
Closes#1670 from pan3793/CELEBORN-757.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Match TransportMessage type use number not enum to support change MessageType name,after this pr, then we can change the MessageType name.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1658 from AngersZhuuuu/CELEBORN-745.
Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
…sion disabled
### What changes were proposed in this pull request?
Avoid memory copy for code path where compression is disabled. Followup of https://github.com/apache/incubator-celeborn/pull/1669
### Why are the changes needed?
ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1671 from waitinfuture/755.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Support to decide whether to compress shuffle data through configuration.
### Why are the changes needed?
Currently, Celeborn compresses all shuffle data, but for example, the shuffle data of Gluten has already been compressed. In this case, no additional compression is required. Therefore, configuration needs to be provided for users to decide whether to use Celeborn’s compression according to the actual situation.
### Does this PR introduce _any_ user-facing change?
no.
Closes#1669 from kerwin-zk/celeborn-755.
Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Provide a new SparkShuffleManager to replace RssShuffleManager in the future
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1667 from AngersZhuuuu/CELEBORN-754.
Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Rename spark patch file name to make it more clear
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1666 from AngersZhuuuu/CELEBORN-753.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Update Grafana dashboard and its setup demo to remove the old name "RSS"
### Why are the changes needed?
Ditto.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
No test needed.
Closes#1663 from FMX/CELEBORN-749.
Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
The benchmark shows that `computeIfAbsent` still has better performance on simple case
```
================================================================================================
HashMap
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Mac OS X 13.4.1
Apple M1 Pro
HashMap: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
putIfAbsent 701 702 0 95.7 10.4 1.0X
computeIfAbsent 534 535 1 125.6 8.0 1.3X
================================================================================================
ConcurrentHashMap
================================================================================================
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Mac OS X 13.4.1
Apple M1 Pro
ConcurrentHashMap: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
putIfAbsent 712 716 3 94.2 10.6 1.0X
computeIfAbsent 702 705 2 95.6 10.5 1.0X
```
### Why are the changes needed?
Introduce a Benchmark framework for future performance sensitive case measurement.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
Closes#1657 from pan3793/CELEBORN-744.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
netty has exposed the public API `PlatformDependent.usedDirectMemory()` to get netty used direct memory since [netty-4.1.35.Final](https://github.com/netty/netty/pull/8945), simplifies the logic
### Why are the changes needed?
simplifies the get netty used direct memory logic
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GA
Closes#1662 from cfmcgrady/netty-used-memory.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
As title
### Why are the changes needed?
In order to distinguish it from the existing master/worker, refactor data replication terminology to 'primary/replica' for improved clarity and inclusivity in the codebase
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
existing tests
Closes#1639 from cfmcgrady/primary-replica.
Lead-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Rename RssHARetryClient to MasterClient
### Why are the changes needed?
Code refactor
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
Closes#1661 from AngersZhuuuu/CELEBORN-748.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Print time/bytes in human-readable format
### Why are the changes needed?
Make logs readable
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
Closes#1659 from pan3793/minor.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Replace `putIfAbsent` with computeIfAbsent in ConcurrentHashMap
### Why are the changes needed?
The invoking of `putIfAbsent` will always call its value if it's a time-consuming operation. So we'd better replace `putIfAbsent` with `computeIfAbsent` in some critical paths.
### Does this PR introduce _any_ user-facing change?
No, it does not affect the user-facing API
### How was this patch tested?
current UT
Closes#1567 from cchung100m/CELEBORN-478.
Lead-authored-by: cchung100m <cchung100m@cs.ccu.edu.tw>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Co-authored-by: Neo Chien <cchung100m@cs.ccu.edu.tw>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Rename project files from rss-xx to celeborn-xx
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1660 from AngersZhuuuu/CELEBORN-746.
Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Enable CI for Celeborn Master/Worker and Client with Spark 3.3/3.4
### Why are the changes needed?
Ensure Celeborn works on Java 17.
Note: there may be some code paths that are not covered by tests, we should fix them in the future.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA
Closes#1649 from pan3793/CELEBORN-738.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Rename HeartbeatResponse to HeartbeatFromWorkerResponse
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1651 from AngersZhuuuu/CELEBORN-739.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Bump Hadoop from 3.2.1 to 3.2.4.
### Why are the changes needed?
Always use the latest patched version.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
Closes#1654 from pan3793/CELEBORN-742.
Lead-authored-by: Cheng Pan <chengpan@apache.org>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
Bump Spark
- from 3.2.2 to 3.2.4
- from 3.3.1 to 3.3.2
- from 3.4.0 to 3.4.1
### Why are the changes needed?
Keep Spark version update-to-date
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA.
Closes#1653 from pan3793/CELEBORN-741.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove usage of deprecated `java.security.AccessController`
### Why are the changes needed?
`AccessController` is deprecated for removal since Java 17
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/security/AccessController.html
Recover building for Java 17
```
[INFO] compiling 72 Scala sources and 209 Java sources to /home/runner/work/incubator-celeborn/incubator-celeborn/common/target/classes ...
Error: /home/runner/work/incubator-celeborn/incubator-celeborn/common/src/main/scala/org/apache/celeborn/common/serializer/SerializationDebugger.scala:71: class AccessController in package security is deprecated
Error: [ERROR] one error found
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
```
scala> System.getProperty("java.version")
res0: String = 1.8.0_332
scala> System.getProperty("sun.io.serialization.extendedDebugInfo")
res1: String = null
scala> java.lang.Boolean.getBoolean("sun.io.serialization.extendedDebugInfo")
res2: Boolean = false
```
Closes#1652 from pan3793/CELEBORN-740.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
This pull PR is an integral component of #1639 . It primarily focuses on updating configuration settings and metrics terminology, while ensuring compatibility with older client versions by refraining from introducing changes related to RPC.
### Why are the changes needed?
In order to distinguish it from the existing master/worker, refactor data replication terminology to 'primary/replica' for improved clarity and inclusivity in the codebase
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
existing tests.
Closes#1650 from cfmcgrady/primary-replica-metrics.
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Remove unused RPC GetWorkerInfo & GetWorkerInfosResponse
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1647 from AngersZhuuuu/CELEBORN-735.
Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
Remove unused RPC ReregisterWorkerResonse
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1646 from AngersZhuuuu/CELEBORN-734.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
In this pr, we rename all RPC blacklist fields, it won't have have compatibility issues.
For RPC `GetBlacklist` and `GetBlacklistResponse` we won't change it, since it won't be used in next release, so we can remove these two RPC in next release.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1643 from AngersZhuuuu/CELEBORN-666-RPC.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Remove unused RPC ThreadDump & ThreadDumpResponse
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1645 from AngersZhuuuu/CELEBORN-732.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Remove unused SlaveLostResponse
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1644 from AngersZhuuuu/CELEBORN-730.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Refine the congestion relevant code/log/comments
### Why are the changes needed?
ditto
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
manually test
Closes#1637 from onebox-li/improve-congestion.
Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix typo `numMapppers`, should be `numMappers`
### Why are the changes needed?
Fix typo
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Protobuf serde depends on message field seq no, not name.
Closes#1642 from pan3793/CELEBORN-729.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Fix the flaky test by enlarging `celeborn.client.shuffle.expired.checkInterval`
### Why are the changes needed?
```
RssHashCheckDiskSuite:
- celeborn spark integration test - hash-checkDiskFull *** FAILED ***
868 was not less than 0 (RssHashCheckDiskSuite.scala:83)
```
https://github.com/apache/incubator-celeborn/actions/runs/5396767745/jobs/9800766633
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass GA, and should observe CI,
Closes#1640 from pan3793/CELEBORN-727.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
To clean the remnant application directory after Celeborn Worker is restarted.
### Why are the changes needed?
Remnant application directories will not be deleted, because `hadoopFs.listFiles(path,false)` will not list directories.
### Does this PR introduce _any_ user-facing change?
No.
Closes#1641 from Demon-Liang/0.3-dev.
Authored-by: Demon Liang <liangdingwen.ldw@alipay.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
(cherry picked from commit 42a9160c8ceaf79bae514c54dafcb5b8e12d5251)
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Remove new allocated location's workers from pushExecludedWrkers should also remove peers
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1636 from AngersZhuuuu/CELEBORN-696-FOLLOWUP.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Unify all blacklist related code and comment
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1638 from AngersZhuuuu/CELEBORN-666-FOLLOWUP.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
…nse with lower versions
### What changes were proposed in this pull request?
The master side will check HeartbeatFromApplication's reply field. if reply is true then it replies HeartbeatFromApplicationResponse otherwise OneWayMessageResponse.
The reply field is default false before the version 0.2.1, so master can be compatible with older client version
### Why are the changes needed?
Before the version `0.2.1`, the response of HeartbeatFromApplication is` OneWayMessageResponse`, but from `0.3.0`, the response of HeartbeatFromApplication is modified to `HeartbeatFromApplicationResponse`.
if the version of `client side `is `0.2.1` and the version of `server side is 0.3.0`, the `compatiblity issue `will occur.
The following compatiblity error will be printted.
``` java
java.io.InvalidObjectException: enum constant HEARTBEAT_FROM_APPLICATION_RESPONSE does not exist in class org.apache.celeborn.common.protocol.MessageType
at java.io.ObjectInputStream.readEnum(ObjectInputStream.java:2157) ~[?:1.8.0_362]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1662) ~[?:1.8.0_362]
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430) ~[?:1.8.0_362]
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354) ~[?:1.8.0_362]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212) ~[?:1.8.0_362]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668) ~[?:1.8.0_362]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502) ~[?:1.8.0_362]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460) ~[?:1.8.0_362]
at org.apache.celeborn.common.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) ~[celeborn-client-spark-3-shaded_2.12-0.2.1-incubating.jar:?]
```
``` java
Caused by: java.lang.ClassCastException: Cannot cast org.apache.celeborn.common.protocol.message.ControlMessages$HeartbeatFromApplicationResponse to org.apache.celeborn.common.protocol.message.ControlMessages$OneWayMessageResponse$
at java.lang.Class.cast(Class.java:3369) ~[?:1.8.0_362]
at scala.concurrent.Future.$anonfun$mapTo$1(Future.scala:500) ~[scala-library-2.12.15.jar:?]
at scala.util.Success.$anonfun$map$1(Try.scala:255) ~[scala-library-2.12.15.jar:?]
at scala.util.Success.map(Try.scala:213) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:67) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:82) ~[scala-library-2.12.15.jar:?]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:59) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:875) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:110) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:107) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:873) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.Promise.trySuccess(Promise.scala:94) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.Promise.trySuccess$(Promise.scala:94) ~[scala-library-2.12.15.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.trySuccess(Promise.scala:187) ~[scala-library-2.12.15.jar:?]
at org.apache.celeborn.common.rpc.netty.NettyRpcEnv.onSuccess$1(NettyRpcEnv.scala:218) ~[celeborn-client-spark-3-shaded_2.12-0.2.1-incubating.jar:?]
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
The pr is tested manually and the testing process is as follows:
1. server side is deploy using the code of latest branch-0.3.
2. spark client is deploy the version of 0.2.1, then run spark-sql to execute 3 tpcds queries( query1.sql/querey2/quere3.sql whose datasize is 1T), finnally verify that the queries are executed successfully and no above compatiblity error printted
3. spark client is deploy the version of 0.3.0, then run spark-sql to execute 3 tpcds queries( query1.sql/querey2/quere3.sql whose datasize is 1T), finnally verify that the queries are executed successfully and no above compatiblity error printted
This patch had conflicts when merged, resolved by
Committer: Cheng Pan <chengpan@apache.org>
Closes#1635 from zhongqiangczq/heartbeat2.
Authored-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Unify exclude and blacklist related configuration
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1633 from AngersZhuuuu/CELEBORN-666-NEW.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
### What changes were proposed in this pull request?
This PR batches revive requests and periodically send to LifecycleManager to reduce number or RPC requests.
To be more detailed. This PR changes Revive message to support multiple unique partitions, and also passes a set unique mapIds for checking MapEnd. Each time ShuffleClientImpl wants to revive, it adds a ReviveRquest to ReviveManager and wait for result. ReviveManager batches revive requests and periodically send to LifecycleManager (deduplicated by partitionId). LifecycleManager constructs ChangeLocationsCallContext and after all locations are notified, it replies to ShuffleClientImpl.
### Why are the changes needed?
In my test 3T TPCDS q23a with 3 Celeborn workers, when kill a worker, the LifecycleManger will receive 4.8w Revive requests:
```
[emr-usermaster-1-1 logs]$ cat spark-emr-user-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master-1-1.c-fa08904e94c028d1.out.1 |grep -i revive |wc -l
64364
```
After this PR, number of ReviveBatch requests reduces to 708:
```
[emr-usermaster-1-1 logs]$ cat spark-emr-user-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master-1-1.c-fa08904e94c028d1.out |grep -i revive |wc -l
2573
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test. I have tested:
1. Disable graceful shutdown, kill one worker, job succeeds
2. Disable graceful shutdown, kill two workers successively, job fails as expected
3. Enable graceful shutdown, restart two workers successively, job succeeds
4. Enable graceful shutdown, restart two workers successively, then kill the third one, job succeeds
Closes#1588 from waitinfuture/656-2.
Lead-authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Co-authored-by: Keyong Zhou <zhouky@apache.org>
Co-authored-by: Keyong Zhou <waitinfuture@gmail.com>
Signed-off-by: Shuang <lvshuang.tb@gmail.com>
### What changes were proposed in this pull request?
Metics update logic need align with Flink 1.17/1.15
### Why are the changes needed?
See [1626](https://github.com/apache/incubator-celeborn/pull/1626) And metics update logic need align with Flink 1.17/1.15
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Tpcds Manual
Closes#1631 from RexXiong/CELEBORN-717-FOLLOWUP.
Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
### What changes were proposed in this pull request?
Fixes concurrent bug in ChangePartitionManager.
### Why are the changes needed?
Before this PR, ```ChangePartitionManager.start``` tries to synchronize on ```requests``` in the body
of ```run()```, but the synchronized keyword was put outside of the ```batchHandleChangePartitionExecutors.submit```,
which has no effect.
When I was testing https://github.com/apache/incubator-celeborn/pull/1588 , I encountered unexpected situations that
when all ```rss-lifecycle-manager-change-partition-executor``` threads are idle, the ```inBatchPartitions``` is still not
empty:
```
23/06/27 20:35:55 INFO ChangePartitionManager: Inside run, shuffleId 0 inBatchPartitions size 834
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes#1634 from waitinfuture/721.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Currently SortBasedShuffleWriter won't update peakMemoryUsedBytes, this pr support this.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1632 from AngersZhuuuu/CELEBORN-720.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
1. Celeborn supports storage type selection. HDD, SSD, and HDFS are available for now.
2. Add new buffer size for HDFS file writers.
3. Worker support empty working dirs.
### Why are the changes needed?
Support HDFS only scenario.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
UT and cluster.
Closes#1619 from FMX/CELEBORN-568.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Celeborn generate hadoop configuration should respect Celeborn conf
### Why are the changes needed?
In spark client side we should write like `spark.celeborn.hadoop.xxx.xx`
In server side we should write like `celeborn.hadoop.xxx.xxx`
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1629 from AngersZhuuuu/CELEBORN-719.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>
…s excluded or not
### What changes were proposed in this pull request?
This PR makes ReviveTimes decrease regardless of the partition location is excluded or not.
### Why are the changes needed?
In such testing setup:
- 3 Celeborn workers
- Client side blacklist enabled ```spark.celeborn.client.push.blacklist.enabled=true```
- Replication is on ```spark.celeborn.client.push.replicate.enabled=true```
- Successively kill 2 workers
I expect the task fail because of revive failure (When replication is on, we need at least 2 workers), but in stead
the tasks hang forever. When digging into the logs I found the ```remain revive times``` does not decrease, leading
to infinite revive loop.
```
23/06/27 14:00:57 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:01 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:05 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:09 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:13 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:17 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:21 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:25 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:29 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:33 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:37 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:41 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:45 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:49 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:53 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:01:57 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:02:01 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:02:05 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:02:09 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
23/06/27 14:02:13 ERROR ShuffleClientImpl: Push data to xxx:xxx failed for shuffle 0 map 998 attempt 1 partition 666 batch 1, remain revive times 5.
```
The reason is before this PR, the revive times will not decrease if the partition location is excluded, which I don't see a
reason for that.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually test.
Closes#1628 from waitinfuture/718.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Reset numBytesOut/numBuffersOut metrics for RemoteShuffleResultPartition
### Why are the changes needed?
Currently ResultPartition lost numBytesOut/numBuffersOut metrics, this will cause Flink AdaptiveScheduler can not dynamically adjust the task parallelism based on the input amount of data
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual test.
Closes#1626 from RexXiong/CELEBORN-717.
Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
### What changes were proposed in this pull request?
This PR aims to make network local address binding support both IP and FQDN strategy.
Additional, it refactors the `ShuffleClientImpl#genAddressPair`, from `${hostAndPort}-${hostAndPort}` to `Pair<String, String>`, which works properly when using IP but may not on FQDN because FQDN may contain `-`
### Why are the changes needed?
Currently, when the bind hostname is not set explicitly, Celeborn will find the first non-loopback address and always uses the IP to bind, this is not suitable for K8s cases, as the STS has a stable FQDN but Pod IP will be changed once Pod restarting.
For `ShuffleClientImpl#genAddressPair`, it must be changed otherwise may cause
```
java.lang.RuntimeException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 11657 in stage 0.0 failed 4 times, most recent failure: Lost task 11657.3 in stage 0.0 (TID 12747) (10.153.253.198 executor 157): java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.celeborn.client.ShuffleClientImpl.doPushMergedData(ShuffleClientImpl.java:874)
at org.apache.celeborn.client.ShuffleClientImpl.pushOrMergeData(ShuffleClientImpl.java:735)
at org.apache.celeborn.client.ShuffleClientImpl.mergeData(ShuffleClientImpl.java:827)
at org.apache.spark.shuffle.celeborn.SortBasedPusher.pushData(SortBasedPusher.java:140)
at org.apache.spark.shuffle.celeborn.SortBasedPusher.insertRecord(SortBasedPusher.java:192)
at org.apache.spark.shuffle.celeborn.SortBasedShuffleWriter.fastWrite0(SortBasedShuffleWriter.java:192)
at org.apache.spark.shuffle.celeborn.SortBasedShuffleWriter.write(SortBasedShuffleWriter.java:145)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1508)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
```
### Does this PR introduce _any_ user-facing change?
Yes, a new configuration `celeborn.network.bind.preferIpAddress` is introduced, and the default value is `true` to preserve the existing behavior.
### How was this patch tested?
Manually testing with `celeborn.network.bind.preferIpAddress=false`
```
Server: 10.178.96.64
Address: 10.178.96.64#53
Name: celeborn-master-0.celeborn-master-svc.spark.svc.cluster.local
Address: 10.153.143.252
Server: 10.178.96.64
Address: 10.178.96.64#53
Name: celeborn-master-1.celeborn-master-svc.spark.svc.cluster.local
Address: 10.153.173.94
Server: 10.178.96.64
Address: 10.178.96.64#53
Name: celeborn-master-2.celeborn-master-svc.spark.svc.cluster.local
Address: 10.153.149.42
starting org.apache.celeborn.service.deploy.worker.Worker, logging to /opt/celeborn/logs/celeborn--org.apache.celeborn.service.deploy.worker.Worker-1-celeborn-worker-4.out
2023-06-25 23:49:52 [INFO] [main] org.apache.celeborn.common.rpc.netty.Dispatcher#51 - Dispatcher numThreads: 4
2023-06-25 23:49:52 [INFO] [main] org.apache.celeborn.common.network.client.TransportClientFactory#91 - mode NIO threads 64
2023-06-25 23:49:52 [INFO] [main] org.apache.celeborn.common.rpc.netty.NettyRpcEnvFactory#51 - Starting RPC Server [WorkerSys] on celeborn-worker-4.celeborn-worker-svc.spark.svc.cluster.local:0 with advisor endpoint celeborn-worker-4.celeborn-worker-svc.spark.svc.cluster.local:0
2023-06-25 23:49:52 [INFO] [main] org.apache.celeborn.common.util.Utils#51 - Successfully started service 'WorkerSys' on port 38303.
```
Closes#1622 from pan3793/CELEBORN-713.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>