### What changes were proposed in this pull request?
1. Make Celeborn read configs from HADOOP_COND_DIR.
2. Remove unnecessary Kerberos configs.
### Why are the changes needed?
To support HDFS with Kerberos.
### Does this PR introduce _any_ user-facing change?
NO.
### How was this patch tested?
GA and cluster.
Closes#2082 from FMX/B1116.
Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
`stop-master.sh` and `stop-worker.sh` support the stop command to wait up to 600s after starting `kill -15`.
Delete the pid file only when the stop succeeds, to avoid failing to retry the stop command to find the pid file.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1911 from cxzl25/CELEBORN-975.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Use `status-master.sh` and `status-worker.sh` to check the pid status corresponding to the master and worker.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes#1912 from cxzl25/CELEBORN-976.
Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
1. Provide `CELEBORN_PREFER_JEMALLOC` configuration to determine whether to enable jemalloc
2. Provide `CELEBORN_JEMALLOC_PATH` to configure the jemalloc path, for example, Centos is `/usr/lib64/libjemalloc.so`
3. Enable jemalloc by default in the docker environment
### Why are the changes needed?
Prevent unnecessary WARNING.
https://github.com/apache/incubator-celeborn/pull/1824#discussion_r1319909938
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
local test
Closes#1895 from cxzl25/CELEBORN-900_diable.
Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Only the Dockfile needs to change in this pr.
### Why are the changes needed?
When deploying celeborn for flink on kubernetes, Introducing jemalloc can improve pod memory usage.
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
Maybe starting a production job to test the memory usage improvement is needed.
Closes#1824 from mddxhj/feature/introduce_jemalloc.
Authored-by: Jun He <xuehaijuxian@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
I find a little difficult to use `celeborn-daemon.sh` to get instance status, so I polish the usage and fix --config load.
### Why are the changes needed?
Ditto
### Does this PR introduce _any_ user-facing change?
Polish the `celeborn-daemon.sh` usage
### How was this patch tested?
Manually test.
Closes#1805 from onebox-li/improve-script.
Lead-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Leo Li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request?
Add --add-opens to bootstrap shell scripts
### Why are the changes needed?
Additional `--add-opens` is required for Java 17, notes, the `--add-opens` list is copied from Spark and was used for UT, I am not sure each of them is required but at least the UT passed with them.
Details supplied by cfmcgrady
[JEP 403](https://openjdk.java.net/jeps/403) targeted for [JDK 17](https://openjdk.java.net/projects/jdk/17/) will remove `--illegal-access` flag. That will be equivalent to `--illegal-access=deny`.
this means using reflection to invoke protected methods of exported `java.*` APIs will no longer work. For example:
```shell
> /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell
| 欢迎使用 JShell -- 版本 17.0.7
| 要大致了解该版本, 请键入: /help intro
jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]
jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);
| 异常错误 java.lang.reflect.InaccessibleObjectException:Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module 34c45dca
| at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:354)
| at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297)
| at Constructor.checkCanSetAccessible (Constructor.java:188)
| at Constructor.setAccessible (Constructor.java:181)
| at (#2:1)
jshell>
```
```shell
> /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell -R --add-opens=java.base/java.nio=ALL-UNNAMED
| 欢迎使用 JShell -- 版本 17.0.7
| 要大致了解该版本, 请键入: /help intro
jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]
jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);
jshell>
```
### Does this PR introduce _any_ user-facing change?
Yes, for Java 17 support.
### How was this patch tested?
CI and review
Closes#1677 from pan3793/CELEBORN-763.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### What changes were proposed in this pull request?
Always set JVM opts `-XX:+IgnoreUnrecognizedVMOptions`
### Why are the changes needed?
By default, JVM failed to start when unknown opts are set, it's not friendly for users who want to use different versions of JDK.
### Does this PR introduce _any_ user-facing change?
Yes, users can success start celeborn even if they provide unknown JVM opts.
### How was this patch tested?
Review.
Closes#1676 from pan3793/CELEBORN-762.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
* [CELEBORN-533] fix bug that in k8s, SIGTERM can't be catched by worker when worker is shutdown
* add exec command before start_master.sh
* fix master