celeborn

Author	SHA1	Message	Date
Luke Yan	c7c2f6a35a	[CELEBORN-858] Generate patch to each Spark 3.x minor version ### What changes were proposed in this pull request? Add the following patch files in directory `incubator-celeborn/tree/spark3-patch/assets/spark-patch` : 1. Celeborn_Dynamic_Allocation_spark3_0.patch 2. Celeborn_Dynamic_Allocation_spark3_1.patch 3. Celeborn_Dynamic_Allocation_spark3_2.patch 4. Celeborn_Dynamic_Allocation_spark3_3.patch Delete a patch at the same time： 1. Celeborn_Dynamic_Allocation_spark3.patch Modified `Support Spark Dynamic Allocation` in incubator-celeborn/README.md ： ![image](https://github.com/apache/incubator-celeborn/assets/108530647/61e2e69b-d3f5-4d11-a20b-374622936443) ### Why are the changes needed? Convenient for customers to apply patches in Spark 3.X for `Support Spark Dynamic Allocation` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? yes. All patch files can be applied to the corresponding version of spark source code through `git apply` without any code conflicts. Closes #2085 from lukeyan2023/spark3-patch. Authored-by: Luke Yan <108530647+lukeyan2023@users.noreply.github.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-11-10 15:35:54 +08:00
mingji	02cea042a0	[CELEBORN-1116] Read authentication configs from `HADOOP_CONF_DIR` ### What changes were proposed in this pull request? 1. Make Celeborn read configs from HADOOP_COND_DIR. 2. Remove unnecessary Kerberos configs. ### Why are the changes needed? To support HDFS with Kerberos. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? GA and cluster. Closes #2082 from FMX/B1116. Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com> Co-authored-by: Fu Chen <cfmcgrady@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-11-09 11:07:13 +08:00
sychen	efa22a4936	[CELEBORN-1105][FLINK] Support Flink 1.18 ### What changes were proposed in this pull request? ### Why are the changes needed? ```bash flink-1.18.0 ./bin/start-cluster.sh ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH ``` ```java Caused by: java.lang.NoSuchMethodError: org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.<init>(Ljava/lang/String;ILorg/apache/flink/runtime/jobgraph/IntermediateDataSetID;Lorg/apache/flink/runtime/io/network/partition/ResultPartitionType;Lorg/apache/flink/runtime/executiongraph/IndexRange;ILorg/apache/flink/runtime/io/network/partition/PartitionProducerStateProvider;Lorg/apache/flink/util/function/SupplierWithException;Lorg/apache/flink/runtime/io/network/buffer/BufferDecompressor;Lorg/apache/flink/core/memory/MemorySegmentProvider;ILorg/apache/flink/runtime/throughput/ThroughputCalculator;Lorg/apache/flink/runtime/throughput/BufferDebloater;)V at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate$FakedRemoteInputChannel.<init>(RemoteShuffleInputGate.java:225) at org.apache.celeborn.plugin.flink.RemoteShuffleInputGate.getChannel(RemoteShuffleInputGate.java:179) at org.apache.flink.runtime.io.network.partition.consumer.InputGate.setChannelStateWriter(InputGate.java:90) at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setChannelStateWriter(InputGateWithMetrics.java:120) at org.apache.flink.streaming.runtime.tasks.StreamTask.injectChannelStateWriterIntoChannels(StreamTask.java:524) at org.apache.flink.streaming.runtime.tasks.StreamTask.<init>(StreamTask.java:496) ``` Flink 1.18.0 release https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/ Interface `org.apache.flink.runtime.io.network.buffer.Buffer` adds `setRecycler` method. [[FLINK-32549](https://issues.apache.org/jira/browse/FLINK-32549)][network] Tiered storage memory manager supports ownership transfer for buffers `org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate` constructor adds parameters. [[FLINK-31638](https://issues.apache.org/jira/browse/FLINK-31638)][network] Introduce the TieredStorageConsumerClient to SingleInputGate [[FLINK-31642](https://issues.apache.org/jira/browse/FLINK-31642)][network] Introduce the MemoryTierConsumerAgent to TieredStorageConsumerClient ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ```bash flink-1.18.0 ./bin/flink run examples/streaming/WordCount.jar --execution-mode BATCH Executing example with default input data. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. Job has been submitted with JobID d7fc5f0ca018a54e9453c4d35f7c598a Program execution finished Job with JobID d7fc5f0ca018a54e9453c4d35f7c598a has finished. Job Runtime: 1635 ms ``` <img width="1297" alt="image" src="https://github.com/apache/incubator-celeborn/assets/3898450/6a5266bf-2386-4386-b98b-a60d2570fa99"> Closes #2063 from cxzl25/CELEBORN-1105. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Shuang <lvshuang.tb@gmail.com>	2023-11-06 15:53:39 +08:00
SteNicholas	f61fe17551	[CELEBORN-987][FOLLOWUP][DOC] README#Build and sbt#System Requirements should extend to Scala 2.13 and Spark 3.5 ### What changes were proposed in this pull request? `README#Build` and `sbt#System Requirements` extends to Scala 2.13. ### Why are the changes needed? `README#Build` and `sbt#System Requirements`should extend to Scala 2.13 to align the SBT CI test results. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? SBT CI tests. Closes #1987 from SteNicholas/CELEBORN-987. Authored-by: SteNicholas <programgeek@163.com> Signed-off-by: Fu Chen <cfmcgrady@gmail.com>	2023-10-14 09:54:22 +08:00
SteNicholas	c97628c510	[CELEBORN-987][DOC] README#Build should extend to Java8/11/17 ### What changes were proposed in this pull request? `README#Build` extends to Java8/11/17. Meanwhile, the profile of maven adds `jdk-17`. ### Why are the changes needed? `README#Build` should extend to Java8/11/17. Meanwhile, the profile of maven should add jdk-17. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local maven compile. Closes #1985 from SteNicholas/CELEBORN-987. Authored-by: SteNicholas <programgeek@163.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-10-12 21:58:32 +08:00
Bowen Song	a734b8cb79	[CELEBORN-1020] Remove outdated info in README.md file ### What changes were proposed in this pull request? The description about restart a Celeborn cluster is outdated, remove this part in README file Closes #1957 from zgzzbws/edit-doc. Authored-by: Bowen Song <song_bowen_work@163.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-10-09 00:11:47 +08:00
mingji	95c9ccfc3e	[CELEBORN-1010] Update docs about `spark.shuffle.service.enabled` ### What changes were proposed in this pull request? To clarify a spark config to work with Celeborn. ### Why are the changes needed? After some tests, I found that Spark 3.1 and newer can work with Celeborn with `spark.shuffle.service.enabled=true`. ExternalShuffleBlockResolver won't check the shuffle manager's type since Spark 3.1 and newer. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? I tested two scenarios about this PR. 1. Check whether Spark can release the executors in time. 2. Check data correctness by running TPC-DS. All checks are good. Closes #1955 from FMX/CELEBORN-1010. Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-10-08 09:15:42 +08:00
zhouyifan279	333db39713	[CELEBORN-954] Add documentation about reliable shuffle data storage ### What changes were proposed in this pull request? As title ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? Yes. A new config was added in [README.md ](https://github.com/apache/incubator-celeborn/blob/main/README.md#spark-configuration). ### How was this patch tested? Closes #1938 from zhouyifan279/reliable-storage-doc. Authored-by: zhouyifan279 <zhouyifan279@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-09-27 00:39:14 +08:00
mingji	e0c00ecd38	[CELEBORN-839][MR] Support Hadoop MapReduce ### What changes were proposed in this pull request? 1. Map side merge and push. 2. Support hadoop2 & 3. 3. Reduce in-memory merge. 4. Integrate LifecycleManager to RmApplicationMaster. ### Why are the changes needed? Ditto. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? Cluster. I tested this PR on a cluster with a 4x 16 CPU 64G Mem 4ESSD cluster. Hadoop 2.8.5 1TB Terasort, 8400 mappers, 1000 reducers Celeborn 81min vs MR shuffle 89min ![mr1](https://github.com/apache/incubator-celeborn/assets/4150993/a3cf6493-b6ff-4c03-9936-4558cf22761d) ![mr2](https://github.com/apache/incubator-celeborn/assets/4150993/9119ffb4-6996-4b77-bcdf-cbd6db5c096f) 1GB wordcount, 8 mappers, 8 reducers Celeborn 35s VS MR shuffle 38s ![mr3](https://github.com/apache/incubator-celeborn/assets/4150993/907dce24-16b7-4788-ab5d-5b784fd07d47) ![mr4](https://github.com/apache/incubator-celeborn/assets/4150993/8e8065b9-6c46-4c8d-9e71-45eed8e63877) Closes #1830 from FMX/CELEBORN-839. Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com> Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-09-14 14:12:53 +08:00
mingji	2ee6e305f1	[CELEBORN-941] fix incorrect deploy doc ### What changes were proposed in this pull request? Fix the incorrect deploy doc about using HDFS only. ### Why are the changes needed? Ditto. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? Just docs. Closes #1874 from FMX/CELEBORN-941. Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com> Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>	2023-08-31 18:54:27 +08:00
liangbowen	1bf93991bc	[CELEBORN-893][DOC] Fix Spark patch list text in Readme ### What changes were proposed in this pull request? - Fix the text of Spark patch list ### Why are the changes needed? Before: <img width="909" alt="image" src="https://github.com/apache/incubator-celeborn/assets/1935105/1d402df1-3a68-4810-8f84-8ab61a38314c"> After: <img width="908" alt="image" src="https://github.com/apache/incubator-celeborn/assets/1935105/2c733568-a08a-4951-bd5a-f4a444a28833"> ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Screenshots attached. Closes #1810 from bowenliang123/readme-patch. Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-08-14 14:54:58 +08:00
e	f78a7d349f	[CELEBORN-794] Fix link of CONFIGURATIONS in README ### What changes were proposed in this pull request? Modify CONFIGURATIONS to point to the correct address ### Why are the changes needed? CONFIGURATIONS in README.md points to an invalid address ![image](https://github.com/apache/incubator-celeborn/assets/14961757/538294ee-3432-4e1e-a45e-4dc1983d50e8) ![image](https://github.com/apache/incubator-celeborn/assets/14961757/d4681603-5317-46ae-a2f5-e58fa72c706c) ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? NO Closes #1714 from jiaoqingbo/CELEBORN-794. Authored-by: e <1178404354@qq.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-14 18:08:09 +08:00
mingji	d0ecf83fec	[CELEBORN-764] Fix celeborn on HDFS might clean using app directories ### What changes were proposed in this pull request? Make Celeborn leader clean expired app dirs on HDFS when an application is Lost. ### Why are the changes needed? If Celeborn is working on HDFS, the storage manager starts and cleans expired app directories, and the newly created worker will want to delete any unknown app directories. This will cause using app directories to be deleted unexpectedly. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? UT and cluster. Closes #1678 from FMX/CELEBORN-764. Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-05 23:11:50 +08:00
zhongqiang.czq	a0f4be67a9	[CELEBORN-765][DOC] Disable partitionSplit in Flink engine related co… …nfigurations ### What changes were proposed in this pull request? In Doc Readme, setting partitionSplit to false should be added in Flink engine related configurations. ### Why are the changes needed? Currently, Mappartition split is not supported, but shuffle partition split is enabled by default, so error will be thrown when flink task's shuffle data size exceeds 1G(by Default). ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually Closes #1679 from zhongqiangczq/readme. Authored-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com> Signed-off-by: zhongqiang.czq <zhongqiang.czq@alibaba-inc.com>	2023-07-05 18:04:10 +08:00
Angerszhuuuu	693172d0bd	[CELEBORN-751] Rename remain rss related class name and filenames etc ### What changes were proposed in this pull request? Rename remain rss related class name and filenames etc... ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1664 from AngersZhuuuu/CELEBORN-751. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>	2023-07-04 10:20:08 +08:00
Angerszhuuuu	5c7ecb8302	[CELEBORN-754][IMPORTANT] Provide a new SparkShuffleManager to replace RssShuffleManager in the future ### What changes were proposed in this pull request? Provide a new SparkShuffleManager to replace RssShuffleManager in the future ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1667 from AngersZhuuuu/CELEBORN-754. Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-30 17:27:33 +08:00
Angerszhuuuu	6e35745736	[CELEBORN-753] Rename spark patch file name to make it more clear ### What changes were proposed in this pull request? Rename spark patch file name to make it more clear ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1666 from AngersZhuuuu/CELEBORN-753. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>	2023-06-30 11:41:12 +08:00
Angerszhuuuu	bd7c2ea35a	[CELEBORN-746][BUILD] Rename project files from rss-xx to celeborn-xx ### What changes were proposed in this pull request? Rename project files from rss-xx to celeborn-xx ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1660 from AngersZhuuuu/CELEBORN-746. Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>	2023-06-29 16:30:02 +08:00
mingji	40760ede3a	[CELEBORN-568] Support storage type selection ### What changes were proposed in this pull request? 1. Celeborn supports storage type selection. HDD, SSD, and HDFS are available for now. 2. Add new buffer size for HDFS file writers. 3. Worker support empty working dirs. ### Why are the changes needed? Support HDFS only scenario. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? UT and cluster. Closes #1619 from FMX/CELEBORN-568. Lead-authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com> Co-authored-by: Ethan Feng <fengmingxiao.fmx@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-27 18:07:08 +08:00
Cheng Pan	e22379c3ab	[CELEBORN-638] Migrate configurations celeborn.ha.master.* to celeborn.master.ha.* ### What changes were proposed in this pull request? It was discussed during the last meeting, but abandoned due to the complication. ### Why are the changes needed? Make the configuration unified. ### Does this PR introduce _any_ user-facing change? Yes, but the legacy configurations still take effect. ### How was this patch tested? New UTs. Closes #1549 from pan3793/CELEBORN-638. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-16 18:18:26 +08:00
Angerszhuuuu	1ba6dee324	[CELEBORN-680][DOC] Refresh celeborn configurations in doc ### What changes were proposed in this pull request? Refresh celeborn configurations in doc ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1592 from AngersZhuuuu/CELEBORN-680. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>	2023-06-15 13:59:38 +08:00
Ethan Feng	5600728149	[CELEBORN-619][CORE][SHUFFLE] Support enable DRA with Apache Celeborn ### What changes were proposed in this pull request? Adapt Spark DRA patch for spark 3.4 ### Why are the changes needed? To support enabling DRA w/ Celeborn on Spark 3.4 ### Does this PR introduce _any_ user-facing change? Yes, this PR provides a DRA patch for Spark 3.4 ### How was this patch tested? Compiled with Spark 3.4 Closes #1529 from FMX/CELEBORN-619. Authored-by: Ethan Feng <ethanfeng@apache.org> Signed-off-by: Ethan Feng <ethanfeng@apache.org>	2023-06-05 09:50:05 +08:00
Cheng Pan	ef8e556202	[CELEBORN-604][SPARK] Support Spark 3.4 (#1509 )	2023-05-24 23:10:13 +08:00
minseok	6e166662f1	[CELEBORN-598] Fix Typos in README	2023-05-21 19:36:38 +08:00
Ethan Feng	7015d2463a	[CELEBORN-583] Merge pooled memory allocators. (#1490 )	2023-05-18 10:37:30 +08:00
Ethan Feng	91b757555e	[CELEBORN-570] Update docs about monitor and deployment. (#1478 )	2023-05-08 17:07:42 +08:00
Ethan Feng	58aa0ba48f	[CELEBORN-566] Refine docs to eliminate misleading configs. (#1473 )	2023-05-03 17:25:59 +08:00
Ethan Feng	537fc94df2	[CELEBORN-549] Update readme about deploy flink client. (#1454 )	2023-04-24 21:03:53 +08:00
Ethan Feng	8584f1049f	Add DingTalk Group info. (#1453 )	2023-04-24 10:11:24 +08:00
cxzl25	13f772e0c0	[CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size	2023-04-14 20:45:25 +08:00
Ethan Feng	599bdbeb72	[CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354 )	2023-03-16 11:33:32 +08:00
zhongqiangchen	4fb5b3d547	[CELEBORN-298] Fix the wrong configuration name in readme and conf.template (#1234 )	2023-02-14 13:38:03 +08:00
Keyong Zhou	a67a275609	Update README.md	2023-02-01 10:46:55 +08:00
Cheng Pan	0c29c5dd57	[CELEBORN-180][BUILD][FOLLOWUP] Update CI workflow and docs (#1134 )	2023-01-03 17:58:51 +08:00
Ethan Feng	65cb36c002	[CELEBORN-83][FOLLOWUP] Fix various bugs when using HDFS as storage. (#1065 )	2022-12-15 15:20:29 +08:00
Ethan Feng	98864889c6	[CELEBORN-5] Update README for jira and slack. (#972 )	2022-11-15 18:42:36 +08:00
Gabriel	0b78cbfee0	[COMMUNITY] Update README (#971 )	2022-11-15 16:10:02 +08:00
leesf	3699683a3b	Fix and migrate some configs (#927 )	2022-11-07 09:41:38 +08:00
Cheng Pan	873eeeb1ed	[BUILD] Add apache- prefix in release tarball name (#854 )	2022-10-25 22:39:48 +08:00
Cheng Pan	8d7d397e71	Fix Configuration page and polish naming (#838 ) * Fix Configuration page and polish naming * nit * nit * comment	2022-10-24 12:46:25 +08:00
Cheng Pan	ea67f4e060	Introduce categories to ConfigEntry and migrate configurations (#775 )	2022-10-17 16:56:54 +08:00
Cheng Pan	5829bda21a	Rework and migrate HA configuration system (#763 )	2022-10-13 22:35:01 +08:00
Cheng Pan	f01a696313	Migrate and refactor configuration for master endpoints (#752 )	2022-10-11 21:33:21 +08:00
dxheming	7ef4144ced	[DOC] Modify build cmd (#758 )	2022-10-11 14:23:01 +08:00
Keyong Zhou	645339b024	Update README.md	2022-10-10 11:57:29 +08:00
Ethan Feng	59474c2f11	[INFRA]Update scripts and templates for new name. (#724 )	2022-10-09 14:56:06 +08:00
Cheng Pan	ab16b4f101	[INFRA] Rename modules w/ celeborn prefix (#723 )	2022-10-08 08:05:57 +08:00
Keyong Zhou	a2d2379153	[DOC] Replace RSS with Celeborn in docs (#715 )	2022-10-06 10:37:46 +08:00
Keyong Zhou	fe3b5988f2	[REFACTOR] Change package name to org.apache.celeborn (#710 )	2022-10-02 18:10:29 +08:00
Kerwin Zhang	10cfdec18f	[DOC] Update the calculation method of the worker's slot count (#702 )	2022-09-30 16:02:59 +08:00

1 2

74 Commits