celeborn

Author	SHA1	Message	Date
zky.zhoukeyong	6a5e3ed794	[CELEBORN-812] Cleanup SendBufferPool if idle for long ### What changes were proposed in this pull request? Cleans up the pooled send buffers and push tasks if the SendBufferPool has been idle for more than `celeborn.client.push.sendbufferpool.expireTimeout`. ### Why are the changes needed? Before this PR the SendBufferPool will cache the send buffers and push tasks forever. If they are large and will not be reused in the future, it wastes memory and causes GC. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passes GA and manual tests. Closes #1735 from waitinfuture/812-1. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-20 00:34:55 +08:00
Angerszhuuuu	5471a6afe5	[CELEBORN-804] ShuffleClient should cleanup shuffle infos when trigger unregisterShuffle ### What changes were proposed in this pull request? After discussion, we make sure that `shuffleManager.unregisterShuffle()` will be triggered by Spark both in driver and executor. In this pr: 1. Add shuffle client both in driver and executor side in ShuffleManager 2. ShuffleClient call cleanupShuffle() when trigger `unregisterShuffle`. This replaced https://github.com/apache/incubator-celeborn/pull/1719 ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1726 from AngersZhuuuu/CELEBORN-804. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-07-19 20:50:18 +08:00
onebox-li	405b2801fa	[CELEBORN-810] Fix some typos and grammar ### What changes were proposed in this pull request? Fix some typos and grammar ### Why are the changes needed? Ditto ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manually test Closes #1733 from onebox-li/fix-typo. Authored-by: onebox-li <lyh-36@163.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-19 18:35:38 +08:00
Angerszhuuuu	c8ad39d9bd	[CELEBORN-809] Directly use isDriver passed from SparkEnv ### What changes were proposed in this pull request? As title <img width="1051" alt="截屏2023-07-19 下午1 01 25" src="https://github.com/apache/incubator-celeborn/assets/46485123/26d506b2-bab9-43f5-9bbe-58d22a761bab"> ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1732 from AngersZhuuuu/CELEBORN-809. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>	2023-07-19 15:20:01 +08:00
Cheng Pan	0db919403e	Revert "[CELEBORN-798] Add heartbeat from client to LifecycleManager to clean…" This reverts commit `e56a8a8bed`.	2023-07-19 15:08:45 +08:00
zky.zhoukeyong	e56a8a8bed	[CELEBORN-798] Add heartbeat from client to LifecycleManager to clean… …up client ### What changes were proposed in this pull request? Add heartbeat from client to lifecycle manager. In this PR heartbeat request contains local shuffle ids from client, lifecycle manager checks with it's local set and returns ids it doesn't know. Upon receiving response, client calls ```unregisterShuffle``` for cleanup. ### Why are the changes needed? Before this PR, client side ```unregisterShuffle``` is never called. When running TPCDS 3T with spark thriftserver without DRA, I found the Executor's heap contains 1.6 million PartitionLocation objects (and StorageInfo): ![image](https://github.com/apache/incubator-celeborn/assets/948245/43658369-7763-4511-a5b0-9b3fbdf02005) After this PR, the number of PartitionLocation objects decreases to 275 thousands ![image](https://github.com/apache/incubator-celeborn/assets/948245/45f8f849-186d-4cad-83c8-64bd6d18debc) This heartbeat can be extended in the future for other purposes, i.e. reporting client's metrics. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passes GA and manual test. Closes #1719 from waitinfuture/798. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-17 18:14:10 +08:00
Cheng Pan	1ec4f4a9f5	[CELEBORN-801] Warn when local shuffle reader is enabled ### What changes were proposed in this pull request? Warn when local shuffle reader is enabled. ``` Detected spark.sql.adaptive.localShuffleReader.enabled (default is true) is enabled, it's highly recommended to disable it when use Celeborn as Remote Shuffle Service to avoid performance degradation. ``` ### Why are the changes needed? When local shuffle reader is enabled, the reduce task may read shuffle data in by map id, which is not match the Celeborn shuffle data clustering model, then cause extremely bad shuffle read performance. ### Does this PR introduce _any_ user-facing change? Yes, user would see warning message from Driver log when `spark.sql.adaptive.localShuffleReader.enabled` is true. ### How was this patch tested? Review. Closes #1721 from pan3793/CELEBORN-801. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-17 16:43:50 +08:00
zky.zhoukeyong	10a1def512	[CELEBORN-802] Reuse DataPusher#idleQueue by pooling to avoid too many byte[] objects ### What changes were proposed in this pull request? Reuse ```DataPusher#idleQueue``` by pooling in ```SendBufferPool``` to avoid too many ```byte[]``` objects in ```PushTask```. ### Why are the changes needed? I'm testing 3T TPCDS. Before this PR, I encountered Container killed because of OOM, GC is about 9.6h. For alive Executors, I dumped the memory and see number of PushTask object is 2w, and the number of ```64k``` byte[] is 23356, total around 1.7G: ![image](https://github.com/apache/incubator-celeborn/assets/948245/7b4ee4fa-7860-4ddb-b862-181a91748092) After this PR, no container is killed because of OOM, GC is about 8.6h. I also dumped Executor and found number of PushTask object is 3584, and the number of ```64K``` byte[] objects is 5783, total around 361M: ![image](https://github.com/apache/incubator-celeborn/assets/948245/981e8f70-52f8-4bb1-9f67-9a8b4f398392) Also, before this PR, total execution time is ```3313.8s```, after this PR, total execution time is ```3229.5s```. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passes GA and Manual test. Closes #1722 from waitinfuture/802. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-17 16:35:14 +08:00
zky.zhoukeyong	a7bbbd05c4	[CELEBORN-797] Decrease writeTime metric sampling frequency to improve perf ### What changes were proposed in this pull request? 1. Decrease writeTime metric sampling frequency to improve perf 2. Set default value of ```celeborn.<module>.push.timeoutCheck.threads``` and ```celeborn.<module>.fetch.timeoutCheck.threads``` to 4 ### Why are the changes needed? Following are test cases case 1: ```spark.sparkContext.parallelize(1 to 8000, 8000).flatMap( _ => (1 to 15000000).iterator.map(num => num)).repartition(8000).count``` // shuffle 1.1T data case 2: ```spark.sparkContext.parallelize(1 to 8000, 8000).flatMap( _ => (1 to 30000000).iterator.map(num => num)).repartition(8000).count``` // shuffle 2.2T data Following are e2e time of shuffle write stage \|\|Sort pusher before\|Sort pusher after\|Hash pusher before\|Hash pusher after\| \|----\|----\|----\|----\|-----\| \|case1\|4.4min\|4.1min\|4.4min\|3.9min\| \|case2\|9.1min\|8.4min\|9.7min\|8.5min\| ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passes GA and manual test. Closes #1718 from waitinfuture/797. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-14 20:51:50 +08:00
无迹	e1337972e8	[CELEBORN-792] SparkShuffleManager.getWriter use wrong appUniqueId fo… …r Spark2 ### What changes were proposed in this pull request? As title ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA and manual test. Closes #1717 from shujiewu/CELEBORN-792. Authored-by: 无迹 <peter.wsj@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-14 17:17:48 +08:00
Fu Chen	90ba9f3e87	[CELEBORN-783][FOLLOWUP] Private member updates and cleanup in `SortBasedPusher` ### What changes were proposed in this pull request? As title ### Why are the changes needed? https://github.com/apache/incubator-celeborn/pull/1699#discussion_r1259137323 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA Closes #1704 from cfmcgrady/insert-record-followup. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-11 23:08:42 +08:00
Fu Chen	e47ec10cef	[CELEBORN-783] Revise the conditions for the `SortBasedPusher#insertRecord` method ### What changes were proposed in this pull request? As title ### Why are the changes needed? [comment](`7adf1fca41 (r121138008)`) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #1699 from cfmcgrady/insert-record. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-07-11 11:36:29 +08:00
Fu Chen	2bd1d86d41	[CELEBORN-775] Fix executorCores calculation in SparkShuffleManager for Spark local mode ### What changes were proposed in this pull request? As title ### Why are the changes needed? ```shell $ bin/spark-shell --master local[2] 23/07/06 16:11:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/07/06 16:11:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context available as 'sc' (master = local[2], app id = local-1688631101733). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.3.1 /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292) Type in expressions to have them evaluated. Type :help for more information. scala> spark.sparkContext.getConf.get("spark.executor.cores") java.util.NoSuchElementException: spark.executor.cores at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.SparkConf.get(SparkConf.scala:245) ... 47 elided scala> ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CelebornPipelineSortSuite should cover this change Closes #1685 from cfmcgrady/local-core-number. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-07-06 16:29:59 +08:00
Angerszhuuuu	693172d0bd	[CELEBORN-751] Rename remain rss related class name and filenames etc ### What changes were proposed in this pull request? Rename remain rss related class name and filenames etc... ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1664 from AngersZhuuuu/CELEBORN-751. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Angerszhuuuu <angers.zhu@gmail.com>	2023-07-04 10:20:08 +08:00
Angerszhuuuu	5c7ecb8302	[CELEBORN-754][IMPORTANT] Provide a new SparkShuffleManager to replace RssShuffleManager in the future ### What changes were proposed in this pull request? Provide a new SparkShuffleManager to replace RssShuffleManager in the future ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1667 from AngersZhuuuu/CELEBORN-754. Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-30 17:27:33 +08:00
Angerszhuuuu	4c67325a3d	[CELEBORN-720][SPARK] Correct metric peakExecutionMemory of SortBasedShuffleWriter ### What changes were proposed in this pull request? Currently SortBasedShuffleWriter won't update peakMemoryUsedBytes, this pr support this. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1632 from AngersZhuuuu/CELEBORN-720. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-27 18:40:06 +08:00
Fu Chen	4b8f126d54	[CELEBORN-716][BUILD] Correct the `to` name when renaming the Netty native library ### What changes were proposed in this pull request? As title ### Why are the changes needed? before this PR the `liborg_apache_celeborn_shaded_netty_transport_native_epoll_aarch_64.so` can't correctly be loaded. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested ```shell > tar zxf celeborn-client-spark-3-shaded_2.12-0.4.0-SNAPSHOT.jar > find * -name "*.so" META-INF/native/liborg_apache_celeborn_shaded_netty_transport_native_epoll_aarch_64.so META-INF/native/liborg_apache_celeborn_shaded_netty_transport_native_epoll_x86_64.so ``` Closes #1625 from cfmcgrady/typo. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-26 21:57:06 +08:00
Fu Chen	1b3ec61690	[CELEBORN-711][TEST] Rework PushDataTimeoutTest ### What changes were proposed in this pull request? 1. separated push data timeout tests and push merge data timeout tests in `PushDataTimeoutTest` 2. updated the test results assertion 3. rework `pushdata timeout will add to blacklist` ### Why are the changes needed? ensure that the timeout behavior is correctly implemented https://github.com/apache/incubator-celeborn/pull/1613#discussion_r1236423721 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #1620 from cfmcgrady/push-timeout-test. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-26 13:45:27 +08:00
zky.zhoukeyong	6b82ecdfa0	[CELEBORN-712] Make appUniqueId a member of ShuffleClientImpl and refactor code ### What changes were proposed in this pull request? Make appUniqueId a member of ShuffleClientImpl and remove applicationId from RPC messages across client side, so it won't cause compatibility issues. ### Why are the changes needed? Currently Celeborn Client is bound to a single application id, so there's no need to pass applicationId around in many RPC messages in client side. ### Does this PR introduce _any_ user-facing change? In some logs the application id will not be printed, which should not be a problem. ### How was this patch tested? UTs. Closes #1621 from waitinfuture/appid. Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-25 21:37:16 +08:00
Fu Chen	18f2be0fbe	[CELEBORN-693][SPARK] Align the `incWriterTime` in the hash-based shuffle writer with the sort-based shuffle ### What changes were proposed in this pull request? As title. ### Why are the changes needed? https://github.com/apache/incubator-celeborn/pull/1585#issuecomment-1589164128 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? tested locally. Closes #1604 from cfmcgrady/hash-based-writer-metrics. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-19 15:42:01 +08:00
sychen	e734ceb558	[MINOR] Cleanup code ### What changes were proposed in this pull request? 1. Use `<arg>-Ywarn-unused-import</arg>` to remove some unused imports There is no way to use `<arg>-Ywarn-unused-import</arg>` at this stage Because we have the following code ``` // Can Remove this if celeborn don't support scala211 in future import org.apache.celeborn.common.util.FunctionConverter._ ``` 2. Fix scala case match not fully covered, avoid `scala.MatchError` 3. Fixed some scala compilation warnings ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1600 from cxzl25/cleanup_code. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-19 11:31:51 +08:00
Fu Chen	b9c9c00697	[CELEBORN-683][SPARK][PERF] Avoid calling `CelebornConf.get` multi-time when columnar shuffle wri… …te is enabled. ### What changes were proposed in this pull request? as title. ### Why are the changes needed? frame graph and stage duration before: ![截屏2023-06-15 下午4 49 04](https://github.com/apache/incubator-celeborn/assets/8537877/6fe7f7f6-fd36-42ec-a6a1-9a4943022dc8) ![截屏2023-06-15 下午4 57 53](https://github.com/apache/incubator-celeborn/assets/8537877/077f6c22-4dc9-497a-affe-ddba9200fe28) frame graph and stage duration after: ![截屏2023-06-15 下午4 37 45](https://github.com/apache/incubator-celeborn/assets/8537877/d6ae7aa6-95c7-490e-a0ae-c110e6a83e5a) ![截屏2023-06-15 下午4 58 12](https://github.com/apache/incubator-celeborn/assets/8537877/e8dd5c3b-94d9-47d7-a644-4897acef43ad) ### Does this PR introduce _any_ user-facing change? No, only perf improvement. ### How was this patch tested? tested locally. Closes #1595 from cfmcgrady/columnar-conf. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-15 17:52:23 +08:00
Fu Chen	86cbf7a359	[CELEBORN-673][SPARK][PERF] Improve the perf of sort-based shuffle write ### What changes were proposed in this pull request? 1. `SQLShuffleWriteMetricsReporter#incWriteTime` is a performance killer, stop calling it once we insert a record 2. simplify the `incWriteTime` logic for handling large records, also including the time required for memory copying ### Why are the changes needed? frame graph and stage duration before: ![截屏2023-06-13 下午3 30 53](https://github.com/apache/incubator-celeborn/assets/8537877/5fb0a242-82d1-4348-aeaa-4af75a012308) ![截屏2023-06-13 下午3 31 26](https://github.com/apache/incubator-celeborn/assets/8537877/3ded2f16-1c17-4120-8d10-31ea7b5182a2) frame graph and stage duration after: ![截屏2023-06-13 下午3 33 08](https://github.com/apache/incubator-celeborn/assets/8537877/fbe45cf2-4d23-4d6c-a476-64338e1610f1) ![截屏2023-06-13 下午3 33 59](https://github.com/apache/incubator-celeborn/assets/8537877/9129d771-ad36-42e9-86b7-e454d2f8e0b0) ### Does this PR introduce _any_ user-facing change? No, only perf improvement ### How was this patch tested? tested locally. Closes #1585 from cfmcgrady/shuffle-metrics. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-13 19:07:04 +08:00
Fu Chen	79806b27ca	[CELEBORN-664][SPARK][PERF] Improve the perf of columnar shuffle write ### What changes were proposed in this pull request? per https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex, use `while` loop for performance-sensitive code framegraph and shuffle write time before: ![截屏2023-06-12 下午4 18 24](https://github.com/apache/incubator-celeborn/assets/8537877/59d94e05-71b5-4474-bebe-66df554ccc48) ![截屏2023-06-12 下午4 19 56](https://github.com/apache/incubator-celeborn/assets/8537877/e24bb8b2-5b16-431b-92ae-cb8216e69d16) framegraph and shuffle write time after: ![截屏2023-06-12 下午4 18 38](https://github.com/apache/incubator-celeborn/assets/8537877/18a84774-2197-487d-aa51-b33445619210) ![截屏2023-06-12 下午4 21 39](https://github.com/apache/incubator-celeborn/assets/8537877/26d95e5a-6e68-46b7-8c8c-49eb2d2e252f) ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1577 from cfmcgrady/columnar-perf. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-12 18:46:00 +08:00
Fu Chen	cc716506f9	[CELEBORN-659][SPARK][TEST] Refine RssShuffleWriterSuiteJ ### What changes were proposed in this pull request? 1. renamed `RssShuffleWriterSuiteJ` to `CelebornShuffleWriterSuiteBase`, which now serves as an abstract base class. 2. two new classes, `HashBasedShuffleWriterSuiteJ` and `SortBasedShuffleWriterSuiteJ`, have been added. These classes extend `CelebornShuffleWriterSuiteBase` and provide suites for testing hash-based and sort-based shuffle writers. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #1570 from cfmcgrady/sort-based-writer-suite. Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-12 13:48:52 +08:00
Cheng Pan	76533d7324	[CELEBORN-650][TEST] Upgrade scalatest and unify mockito version ### What changes were proposed in this pull request? This PR upgrades - `mockito` from 1.10.19 and 3.6.0 to 4.11.0 - `scalatest` from 3.2.3 to 3.2.16 - `mockito-scalatest` from 1.16.37 to 1.17.14 ### Why are the changes needed? Housekeeping, making test dependencies up-to-date and unified. ### Does this PR introduce _any_ user-facing change? No, it only affects test. ### How was this patch tested? Pass GA. Closes #1562 from pan3793/CELEBORN-650. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-09 10:04:14 +08:00
Cheng Pan	6b64b1de9c	[CELEBORN-648][SPARK] Improve perf of SendBufferPool and logs about memory ### What changes were proposed in this pull request? - Replace index-based item access with an iterator for LinkedList. - Always try to remove a buffer if SendBufferPool does not have a matched candidate, this change makes the total buffer number from `capacity+N-1` to `capacity` in worst cases. - Some logs and code polish. ### Why are the changes needed? Improve performance and logs, reduce memory consumption. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #1560 from pan3793/CELEBORN-648. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-09 09:45:27 +08:00
Cheng Pan	0636e3ca40	[CELEBORN-654][SPARK] SortBasedShuffleWriter does not require mapStatusRecords in Spark 3 ### What changes were proposed in this pull request? `mapStatusRecords` is required in Spark 2 for constructing `MapStatus` when AQE is enabled, but not in Spark 3, so remove it to save memory and compute resources. This PR also simplifies the `for loop` code. ### Why are the changes needed? Remove unnecessary variables to save resources and clean up code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #1564 from pan3793/CELEBORN-654. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-09 09:43:08 +08:00
Cheng Pan	1ae8eb7145	[CELEBORN-655][SPARK] Rename newAppId to appUniqueId ### What changes were proposed in this pull request? Rename variable `newAppId` to `appUniqueId` in Spark client. ### Why are the changes needed? Make the variable name intuitive. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #1565 from pan3793/CELEBORN-655. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>	2023-06-08 22:14:20 +08:00
Cheng Pan	5bc37f1286	[CELEBORN-637] Remove support for rss.* configuration alias ### What changes were proposed in this pull request? Remove support for `rss.` configuration alias ### Why are the changes needed? The legacy `rss.` configuration alias was added during Celeborn entering Apache Incubator, to simplify users' migration from RSS to Celeborn. Lots of configuration changes happened after Celeborn 0.2, the `rss.` configuration alias become less helpful, so remove it to clean up the code. ### Does this PR introduce _any_ user-facing change? Yes, but it's expected, the `rss.` compatibility has never been documented. ### How was this patch tested? Pass GA. Closes #1547 from pan3793/CELEBORN-637. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>	2023-06-07 22:28:36 +08:00
xiyu.zk	82bdea7085	[CELEBORN-620] Fix columnar shuffle codegen exception ### What changes were proposed in this pull request? Fix columnar shuffle codegen exception. This is a refactoring of #1523。 Closes #1543 from kerwin-zk/issue-620. Authored-by: xiyu.zk <xiyu.zk@alibaba-inc.com> Signed-off-by: xiyu.zk <xiyu.zk@alibaba-inc.com>	2023-06-05 12:05:06 +08:00
Angerszhuuuu	4df4775524	[CELEBORN-632][DOC] Add spark name space to spark specify properties (#1538 )	2023-06-02 21:48:56 +08:00
Ethan Feng	d33916e571	[CELEBORN-625] Add a config to enable/disable UnsafeRow fast write. (#1532 )	2023-06-01 20:55:45 +08:00
Angerszhuuuu	cf308aa057	[CLEBORN-595] Refine code frame of CelebornConf (#1525 )	2023-06-01 10:37:58 +08:00
Angerszhuuuu	62681ba85d	[CELEBORN-595] Rename and refactor the configuration doc. (#1501 )	2023-05-30 15:14:12 +08:00
Cheng Pan	ef8e556202	[CELEBORN-604][SPARK] Support Spark 3.4 (#1509 )	2023-05-24 23:10:13 +08:00
Angerszhuuuu	a22c61e479	[CELEBORN-582] Celeborn should handle InterruptedException during kill task properly (#1486 )	2023-05-17 18:17:41 +08:00
Angerszhuuuu	783d4e5dc5	[CELEBORN-551] Remove unnecessary ShuffleClient.get() (#1456 )	2023-05-04 20:47:45 +08:00
cxzl25	13f772e0c0	[CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size	2023-04-14 20:45:25 +08:00
Kerwin Zhang	27a1f369cf	[CELEBORN-472] Support using Celeborn in the scenario of switching multiple SparkContexts in the same process (#1379 )	2023-03-27 16:10:34 +08:00
Keyong Zhou	7adf1fca41	[CELEBORN-295] Optimize data push (#1232 ) * [CELEBORN-295] Add double buffer for sort pusher	2023-02-28 10:35:55 +08:00
jiaoqingbo	bd9e0ddc1f	[CELEBORN-304] Missing setIfMissing `celeborn.$module.io.serverThreads` (#1238 )	2023-02-15 15:49:08 +08:00
Angerszhuuuu	c410392284	[CELEBORN-265] Integration with Spark3.0 cast class exception of ShuffleHandler (#1197 ) * [CELEBORN-265] Integration with Spark3.0 cast class exception of ShuffleHandler	2023-02-02 11:52:51 +08:00
Keyong Zhou	e47f1e33b0	[CELEBORN-55][FOLLOWUP] Code refine (#1175 )	2023-01-20 16:22:47 +08:00
zy.jordan	c5be79ee3d	[CELEBORN-55][FEATURE] Split maxReqsInFlight limitation into level of target worker (#1102 )	2023-01-20 10:18:45 +08:00
jxysoft	41b1fa46d3	[CELEBORN-185][SPARK] Can't release shuffle data if rss fallback to nss (#1133 ) Co-authored-by: xianyao.jiang <xianyao.jiang@antfin.com>	2023-01-03 14:28:09 +08:00
nafiy	ddab27a1d7	[CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse (#1093 ) * [CELEBORN-145][REFACTOR] Add reason in CheckQuotaResponse	2022-12-15 18:16:34 +08:00
Cheng Pan	ec371c0026	[CELEBORN-132] ShuffleClient should not implement Cloneable (#1077 )	2022-12-14 10:04:39 +08:00
Angerszhuuuu	dac2ba6b40	[CELEBORN-114][REFACTOR] Keep same log code in spark2/spark3 of quota exceed (#1058 )	2022-12-09 12:13:01 +08:00
nafiy	529bb22781	[ISSUE-958][REFACTOR] Add and modify log of fallback policy (#965 )	2022-11-14 20:16:33 +08:00

1 2 3

132 Commits