Commit Graph

921 Commits

Author SHA1 Message Date
Cheng Pan
007b716b64
[CELEBORN-633][INFRA] Introduce PR merge script
### What changes were proposed in this pull request?

Introduce PR merge script `dev/merge_pr.py`, which is borrowed from Apache Spark

### Why are the changes needed?

This script simplifies the PR merge procedure

- auto backport to release branches
- auto close the JIRA ticket
- auto fill in the JIRA fixed version
- reserve the PR description in git log
- reserve the author and committer in git log

### Does this PR introduce _any_ user-facing change?

No, it's for committers.

### How was this patch tested?

a1de16a80f was merged by this tool

Closes #1539 from pan3793/CELEBORN-633.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-02 19:52:04 +08:00
zwangsheng
67762783d0
[CELEBORN-628][HELM] Separate mount & host path on hostPath case
### What changes were proposed in this pull request?
Seperate Mount & Volumes Path On Kubernete Case

### Why are the changes needed?
See detail in https://github.com/apache/incubator-celeborn/pull/1508#discussion_r1208803085

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Local Test

> Values.yaml
```yaml
volumes:
  master:
    - mountPath: /mnt/rss_ratis
      hostPath: /spark/data1
      type: hostPath
      size: 1Gi
  worker:
    - mountPath: /mnt/disk1
      hostPath: /spark/data1
      type: hostPath
      size: 1Gi
    - mountPath: /mnt/disk2
      hostPath: /spark/data2
      type: hostPath
      size: 1Gi
```

>Celeborn Worker Pod
```yaml
containers:
  volumeMounts:
    - mountPath: /mnt/disk1
      name: celeborn-worker-vol-0
    - mountPath: /mnt/disk2
      name: celeborn-worker-vol-1
volumes:
  - hostPath:
      path: /spark/data1/worker
      type: DirectoryOrCreate
    name: celeborn-worker-vol-0
  - hostPath:
      path: /spark/data2/worker
      type: DirectoryOrCreate
    name: celeborn-worker-vol-1
```

Closes #1535 from zwangsheng/CELEBORN-628.

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-02 19:47:13 +08:00
Shuang
a1de16a80f
[CELEBORN-626] Fix potential deadlock in filewriter
### What changes were proposed in this pull request?
Lock flushBuffer field and flush method to make sure thread safe access.

### Why are the changes needed?
When stageEnd, worker will commit files and filewriters would be closed, the speculative task may still push data to the file writer, if the push task increment numPendingWrites. the commit thread which hold the filewriter object lock will need wait the pending writes decrement to 0. but push thread need the filewriter object lock to  decrement numPendingWrites, this cause deadlock..

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #1534 from RexXiong/CELEBORN-626.

Authored-by: Shuang <lvshuang.tb@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-02 17:47:39 +08:00
zhongqiang.czq
3d9a28a98d
[CELEBORN-630] Binary release artifact should package all versions of Spark and Flink clients
…link version

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #1537 from zhongqiangczq/release-content.

Authored-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2023-06-02 17:40:41 +08:00
Angerszhuuuu
4f1ca8c960
[CELEBORN-621][BUG] Push merged data task timeout and mapended should also remove push states (#1526) 2023-06-02 14:04:39 +08:00
Angerszhuuuu
e18a5ea769
[CELEBORN-624] StorageManager should only remove expired app dirs (#1531) 2023-06-02 11:33:33 +08:00
Ethan Feng
d33916e571
[CELEBORN-625] Add a config to enable/disable UnsafeRow fast write. (#1532) 2023-06-01 20:55:45 +08:00
Angerszhuuuu
cf308aa057
[CLEBORN-595] Refine code frame of CelebornConf (#1525) 2023-06-01 10:37:58 +08:00
Binjie Yang
b785c7c565
[CELEBORN-612][HELM] Tackle hostPath directory permission (#1521) 2023-06-01 10:34:25 +08:00
ulysses
fa920ab0d5
Relax isRssEnabled condition (#1528)
Co-authored-by: youxiduo <youxiduo@corp.netease.com>
2023-05-31 15:26:05 +08:00
Angerszhuuuu
6d5dd50915
[CELEBORN-595][FOLLOWUP] Fix change version to 0.3.0. (#1522) 2023-05-30 20:12:56 +08:00
Angerszhuuuu
62681ba85d
[CELEBORN-595] Rename and refactor the configuration doc. (#1501) 2023-05-30 15:14:12 +08:00
zhongqiangchen
f117cff776
[CELEBORN-618] [FLINK] worker side adds partition split configuration options (#1520) 2023-05-30 14:13:31 +08:00
Angerszhuuuu
c4bff654b0
[CELEBORN-614] Simplify StorageManager's flushFileWriters to avoid too much cost on collection operation (#1517) 2023-05-30 11:38:05 +08:00
Binjie Yang
d30f45ad63
[CELEBORN-450][HELM] Configurable volumes in the values.yaml (#1508)
* [CELEBORN-450] Configure the mount & volume in the Values.yaml

* fix comments

* fix wrong name

* fix comments

* fix typo

* fix into array

* Wiht User Note Comments

* fix comments

* Update charts/celeborn/templates/worker-statefulset.yaml

---------

Co-authored-by: Cheng Pan <pan3793@gmail.com>
2023-05-29 13:48:23 +08:00
Angerszhuuuu
07011f5a4d
[CELEBORN-601] Consolidate configsWithAlternatives with ConfigBuilder.withAlternative (#1506) 2023-05-28 09:13:05 +08:00
Shuang
2972c5f7d3
[CELEBORN-611] Improve log4j's configuration for deleting old log files when match the conditions. (#1516) 2023-05-25 20:51:53 +08:00
Cheng Pan
df385bedd3
[CELEBORN-608][BUILD] Exclude macOS fflags in make-distribution.sh (#1513) 2023-05-25 14:25:13 +08:00
Cheng Pan
c29f2f0aa8
[CELEBORN-605][BUILD] Remove redundant exclusions from hadoop-client-api (#1510) 2023-05-25 10:40:15 +08:00
Cheng Pan
a3ad8bbcd5
[CELEBORN-607] Simplify bootstrap scripts for adding --add-opens java opts (#1512) 2023-05-24 23:20:25 +08:00
Ethan Feng
4ee7d9eba8
[CELEBORN-597][FLINK] Support flink floating buffer for input gate and output gate. (#1503) 2023-05-24 23:15:57 +08:00
Cheng Pan
ef8e556202
[CELEBORN-604][SPARK] Support Spark 3.4 (#1509) 2023-05-24 23:10:13 +08:00
Angerszhuuuu
7572c5b261
[CELEBORN-609] Refactor master's worker info HTTP request (#1514) 2023-05-24 18:15:39 +08:00
Angerszhuuuu
4f85d80687
[CELEBORN-606] Refine CommitHandler's noisy log (#1511) 2023-05-24 15:25:10 +08:00
zhongqiangchen
e6978c380b
[CELEBORN-603] Update version to 0.4.0-SNAPSHOT (#1507) 2023-05-24 14:31:10 +08:00
Leo Li
de97ad26ce
[CELEBORN-599] Consolidate calculation of mount point (#1505)
* [CELEBORN-599] Fix worker dirs get mount point

* update

* update

---------

Co-authored-by: liyihe <liyihe@bigo.sg>
2023-05-23 14:06:02 +08:00
Angerszhuuuu
6619015a63
[CELEBORN-596] Worker don't need to update disk max slots (#1502) 2023-05-23 10:30:35 +08:00
minseok
6e166662f1
[CELEBORN-598] Fix Typos in README 2023-05-21 19:36:38 +08:00
Angerszhuuuu
d244f44518
[CELEBORN-593] Refine some RPC related default configurations (#1498) 2023-05-19 18:23:12 +08:00
Angerszhuuuu
615d9a111f
[CELEBORN-487] Remove wrong space of config SHUFFLE_CLIENT_PUSH_BLACK (#1500) 2023-05-19 14:27:57 +08:00
Ethan Feng
ac78afdc4e
[CELEBORN-594] Eliminate Ratis noisy logs. (#1499) 2023-05-19 14:05:52 +08:00
Angerszhuuuu
aa817bdbeb
[CELEBORN-446][FOLLOWUP] Check rack should use nextMasterIndex.(#1496) 2023-05-18 16:25:14 +08:00
Angerszhuuuu
42219aeb2a
[CELEBORN-592][REFACTOR] Refactor PbSerDeUtils's some foreach code format (#1497) 2023-05-18 16:22:14 +08:00
Shuang
6eabc519b3
[CELEBORN-591] RatisSystem need decrease no leader timeout configuration. (#1495) 2023-05-18 14:49:06 +08:00
Angerszhuuuu
811e192bbd
[CELEBORN-446] Support rack aware during assign slots for ROUNDROBIN (#1370) 2023-05-18 13:58:51 +08:00
Kaijie Chen
67bc420801
[CELEBORN-558] Bump Ratis to 2.5.1 and fix API changes (#1464) 2023-05-18 11:08:37 +08:00
Ethan Feng
7015d2463a
[CELEBORN-583] Merge pooled memory allocators. (#1490) 2023-05-18 10:37:30 +08:00
Angerszhuuuu
a22c61e479
[CELEBORN-582] Celeborn should handle InterruptedException during kill task properly (#1486) 2023-05-17 18:17:41 +08:00
Angerszhuuuu
791d72d45f
[CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR (#1494) 2023-05-17 17:57:27 +08:00
Angerszhuuuu
7c6cb2f3bb
[CELEBORN-588] Remove test conf's category (#1491) 2023-05-17 17:37:28 +08:00
Cheng Pan
3cc296ef4f
[CELEBORN-589][INFRA] Using Apache CDN to download maven (#1492) 2023-05-17 15:46:38 +08:00
Leo Li
65cdb3eba4
[CELEBORN-585] Create if not exists worker recoverPath when graceful shutdown is enabled (#1487) 2023-05-17 11:29:09 +08:00
Angerszhuuuu
64a3534f71
[CELEBORN-584] Worker side should expose push/replicate/fetch Netty allocator metrics (#1489) 2023-05-16 17:51:33 +08:00
Shuang
f83304c337
[CELEBORN-581][Flink] Support JobManager failover. (#1485) 2023-05-16 14:51:53 +08:00
Angerszhuuuu
d657f8268a
[CELEBORN-586] Add SystemMiscSource to indicate system running status (#1488) 2023-05-16 14:03:07 +08:00
zhongqiangchen
5769c3fdc7
[CELEBORN-552] Add HeartBeat between the client and worker to keep alive (#1457) 2023-05-10 19:35:51 +08:00
Shuang
fb753fd48e
[CELEBORN-573] Guarantee resource/app/worker change persistent to raft in Ha Mode. (#1477) 2023-05-10 14:28:52 +08:00
Angerszhuuuu
778b5440bc
[CELEBORN-556][BUG] ReserveSlot should not use default RPC time out since register shuffle max timeout is network timeout (#1461) 2023-05-10 12:29:06 +08:00
Shuang
2fea818fa8
[CELEBORN-579] revert Destroy Message rename for compatibility. (#1482) 2023-05-09 15:24:02 +08:00
Angerszhuuuu
5f7e1ce8e2
[CELEBORN-578][REFACTOR] Refine commit file's log to indicate more clear about empty partitions (#1481) 2023-05-08 18:21:46 +08:00