Commit Graph

25 Commits

Author SHA1 Message Date
mingji
02cea042a0 [CELEBORN-1116] Read authentication configs from HADOOP_CONF_DIR
### What changes were proposed in this pull request?
1. Make Celeborn read configs from HADOOP_COND_DIR.
2. Remove unnecessary Kerberos configs.

### Why are the changes needed?
To support HDFS with Kerberos.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
GA and cluster.

Closes #2082 from FMX/B1116.

Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Co-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-11-09 11:07:13 +08:00
sychen
07c1dc2568 [CELEBORN-975] Refactor the check logic to stop the celeborn master and worker
### What changes were proposed in this pull request?

`stop-master.sh` and `stop-worker.sh` support the stop command to wait up to 600s after starting `kill -15`.

Delete the pid file only when the stop succeeds, to avoid failing to retry the stop command to find the pid file.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1911 from cxzl25/CELEBORN-975.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-18 16:23:32 +08:00
sychen
375e855d42 [CELEBORN-976] Introduce script to check master and worker status
### What changes were proposed in this pull request?
Use `status-master.sh` and `status-worker.sh` to check the pid status corresponding to the master and worker.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1912 from cxzl25/CELEBORN-976.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-16 09:27:52 +08:00
sychen
8b7989ad0c [CELEBORN-900][FOLLOWUP] Disable jemalloc in non-docker environment
### What changes were proposed in this pull request?
1. Provide `CELEBORN_PREFER_JEMALLOC` configuration to determine whether to enable jemalloc
2. Provide `CELEBORN_JEMALLOC_PATH` to configure the jemalloc path, for example, Centos is `/usr/lib64/libjemalloc.so`
3. Enable jemalloc by default in the docker environment

### Why are the changes needed?
Prevent unnecessary WARNING.

https://github.com/apache/incubator-celeborn/pull/1824#discussion_r1319909938

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

Closes #1895 from cxzl25/CELEBORN-900_diable.

Lead-authored-by: sychen <sychen@ctrip.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-09-11 14:55:10 +08:00
Jun He
ada12a2c0e
[CELEBORN-900] Prefer to use jemalloc for memory allocation
### What changes were proposed in this pull request?

Only the Dockfile needs to change in this pr.

### Why are the changes needed?

When deploying celeborn for flink on kubernetes, Introducing jemalloc can improve pod memory usage.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?
Maybe starting a production job to test the memory usage improvement is needed.

Closes #1824 from mddxhj/feature/introduce_jemalloc.

Authored-by: Jun He <xuehaijuxian@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-09-08 19:49:24 +08:00
Cheng Pan
e137d0e1e7 [CELEBORN-887] Option --config should take effect in celeborn-daemon.sh script
### What changes were proposed in this pull request?
I find a little difficult to use `celeborn-daemon.sh` to get instance status, so I polish the usage and fix --config load.

### Why are the changes needed?
Ditto

### Does this PR introduce _any_ user-facing change?
Polish the `celeborn-daemon.sh` usage

### How was this patch tested?
Manually test.

Closes #1805 from onebox-li/improve-script.

Lead-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Leo Li <lyh-36@163.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
2023-08-11 21:57:21 +08:00
Cheng Pan
5b3f43dffc
[CELEBORN-763] Add --add-opens to bootstrap shell scripts
### What changes were proposed in this pull request?

Add --add-opens to bootstrap shell scripts

### Why are the changes needed?

Additional `--add-opens` is required for Java 17, notes, the `--add-opens` list is copied from Spark and was used for UT, I am not sure each of them is required but at least the UT passed with them.

Details supplied by cfmcgrady

[JEP 403](https://openjdk.java.net/jeps/403) targeted for [JDK 17](https://openjdk.java.net/projects/jdk/17/) will remove `--illegal-access` flag. That will be equivalent to `--illegal-access=deny`.

this means using reflection to invoke protected methods of exported `java.*` APIs will no longer work. For example:

```shell
> /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell
|  欢迎使用 JShell -- 版本 17.0.7
|  要大致了解该版本, 请键入: /help intro

jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]

jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);
|  异常错误 java.lang.reflect.InaccessibleObjectException:Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module 34c45dca
|        at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:354)
|        at AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297)
|        at Constructor.checkCanSetAccessible (Constructor.java:188)
|        at Constructor.setAccessible (Constructor.java:181)
|        at (#2:1)

jshell>

```

```shell
>  /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/jshell -R --add-opens=java.base/java.nio=ALL-UNNAMED
|  欢迎使用 JShell -- 版本 17.0.7
|  要大致了解该版本, 请键入: /help intro

jshell> java.nio.ByteBuffer direct = java.nio.ByteBuffer.allocateDirect(1);
direct ==> java.nio.DirectByteBuffer[pos=0 lim=1 cap=1]

jshell> direct.getClass().getDeclaredConstructor(long.class, int.class).setAccessible(true);

jshell>
```

### Does this PR introduce _any_ user-facing change?

Yes, for Java 17 support.

### How was this patch tested?

CI and review

Closes #1677 from pan3793/CELEBORN-763.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-05 11:31:21 +08:00
Cheng Pan
de0fd8cc44 [CELEBORN-762] Always set JVM opts -XX:+IgnoreUnrecognizedVMOptions
### What changes were proposed in this pull request?

Always set JVM opts `-XX:+IgnoreUnrecognizedVMOptions`

### Why are the changes needed?

By default, JVM failed to start when unknown opts are set, it's not friendly for users who want to use different versions of JDK.

### Does this PR introduce _any_ user-facing change?

Yes, users can success start celeborn even if they provide unknown JVM opts.

### How was this patch tested?

Review.

Closes #1676 from pan3793/CELEBORN-762.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-07-04 21:37:19 +08:00
Cheng Pan
a3ad8bbcd5
[CELEBORN-607] Simplify bootstrap scripts for adding --add-opens java opts (#1512) 2023-05-24 23:20:25 +08:00
Aaron Wang
6dad856fec
[CELEBORN-564] Correct stop-all.sh comments (#1470) 2023-04-28 09:38:59 +08:00
zhongqiangchen
d531ec499e
[CELEBORN-533] Bootstrap scripts should use exec to avoid fork subprocess (#1437)
* [CELEBORN-533] fix bug that in k8s, SIGTERM can't be catched by worker when worker is shutdown

* add exec command before start_master.sh

* fix master
2023-04-19 22:00:12 +08:00
hongdd
ae390d9615
[CELEBORN-439] Fix java version check in start-work (#1359) 2023-03-17 16:45:57 +08:00
Ethan Feng
599bdbeb72
[CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354) 2023-03-16 11:33:32 +08:00
Angerszhuuuu
e805c74aad
[CELEBORN-360] Export necessary env in load-celeborn-env.sh (#1308) 2023-03-03 21:09:11 +08:00
Angerszhuuuu
734f14991a
[CELEBORN-360] Add celeborn ratis shell command line (#1294)
* [CELEBORN-360] Add celeborn ratis shell command line
2023-03-02 16:30:45 +08:00
Rex(Hui) An
e23f5ac679
[CELEBORN-258][FOLLOW UP] sbin/restart-worker.sh should also import the sbin/celeborn-config.sh
Co-authored-by: Hui An <hui.an@shopee.com>
2023-02-01 17:28:20 +08:00
Angerszhuuuu
2577f09938
[CELEBORN-259] Correct wrong comment in restart.sh (#1194) 2023-02-01 14:54:13 +08:00
Rex(Hui) An
0f97fbf38d
[CELEBORN-258] sbin/restart-worker.sh should respect CELEBORN_WORKER_MEMORY and CELEBORN_WORKER_OFFHEAP_MEMORY (#1192)
Co-authored-by: Hui An <hui.an@shopee.com>
2023-02-01 14:36:31 +08:00
Ethan Feng
e219e8b44e
[CELEBORN-171] Support JDK11. (#1169) 2023-01-16 19:35:54 +08:00
Binjie Yang
a31dcc8194
[IMPROVEMENT] Improve celeborn script logic (#1020) 2022-11-30 20:03:56 +08:00
Ethan Feng
59474c2f11
[INFRA]Update scripts and templates for new name. (#724) 2022-10-09 14:56:06 +08:00
Keyong Zhou
fe3b5988f2
[REFACTOR] Change package name to org.apache.celeborn (#710) 2022-10-02 18:10:29 +08:00
Cheng Pan
3dddb65f31
Enable Apache Rat and fix license header (#492) 2022-08-31 23:53:33 +08:00
AngersZhuuuu
3a42f172fa
[ISSUE-388][SHELL] Add restart script for worker's quick recover (#460) 2022-08-25 15:18:25 +08:00
zky.zhoukeyong
ba5920acde Initial Commit for RSS 2021-12-28 20:57:35 +08:00