kyuubi/externals
Fu Chen 1a651254cb
[KYUUBI #4662] [ARROW] Arrow serialization should not introduce extra shuffle for outermost limit
### _Why are the changes needed?_

The fundamental concept is to execute a job similar to the way in which `CollectLimitExec.executeCollect()` operates.

```sql
select * from parquet.`parquet/tpcds/sf1000/catalog_sales` limit 20;
```

Before this PR:
![截屏2023-04-04 下午3 20 34](https://user-images.githubusercontent.com/8537877/229717946-87c480c6-9550-4d00-bc96-14d59d7ce9f7.png)

![截屏2023-04-04 下午3 20 54](https://user-images.githubusercontent.com/8537877/229717973-bf6da5af-74e7-422a-b9fa-8b7bebd43320.png)

After this PR:

![截屏2023-04-04 下午3 17 05](https://user-images.githubusercontent.com/8537877/229718016-6218d019-b223-4deb-b596-6f0431d33d2a.png)

![截屏2023-04-04 下午3 17 16](https://user-images.githubusercontent.com/8537877/229718046-ea07cd1f-5ffc-42ba-87d5-08085feb4b16.png)

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4662 from cfmcgrady/arrow-collect-limit-exec-2.

Closes #4662

82c912ed6 [Fu Chen] close vector
130bcb141 [Fu Chen] finally close
facc13f78 [Fu Chen] exclude rule OptimizeLimitZero
370083910 [Fu Chen] SparkArrowbasedOperationSuite adapt Spark-3.1.x
6064ab961 [Fu Chen] limit = 0 test case
6d596fcce [Fu Chen] address comment
8280783c3 [Fu Chen] add `isStaticConfigKey` to adapt Spark-3.1.x
22cc70fba [Fu Chen] add ut
b72bc6fb2 [Fu Chen] add offset support to adapt Spark-3.4.x
9ffb44fb2 [Fu Chen] make toBatchIterator private
c83cf3f5e [Fu Chen] SparkArrowbasedOperationSuite adapt Spark-3.1.x
573a262ed [Fu Chen] fix
4cef20481 [Fu Chen] SparkArrowbasedOperationSuite adapt Spark-3.1.x
d70aee36b [Fu Chen] SparkPlan.session -> SparkSession.active to adapt Spark-3.1.x
e3bf84c03 [Fu Chen] refactor
81886f01c [Fu Chen] address comment
2286afc6b [Fu Chen] reflective calla AdaptiveSparkPlanExec.finalPhysicalPlan
03d074732 [Fu Chen] address comment
25e4f056c [Fu Chen] add docs
885cf2c71 [Fu Chen] infer row size by schema.defaultSize
4e7ca54df [Fu Chen] unnecessarily changes
ee5a7567a [Fu Chen] revert unnecessarily changes
6c5b1eb61 [Fu Chen] add ut
4212a8967 [Fu Chen] refactor and add ut
ed8c6928b [Fu Chen] refactor
008867122 [Fu Chen] refine
8593d856a [Fu Chen] driver slice last batch
a5849430a [Fu Chen] arrow take

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
2023-04-10 09:43:30 +08:00
..
kyuubi-chat-engine [KYUUBI #4558] [CHAT] Make ChatGPT model ID configurable 2023-03-19 22:27:58 +08:00
kyuubi-download [KYUUBI #4348] [INFRA] Cache engine archives in CI jobs for maven download plugin 2023-02-18 22:34:02 +08:00
kyuubi-flink-sql-engine [KYUUBI #1652] Support Flink yarn application mode 2023-04-07 18:51:48 +08:00
kyuubi-hive-sql-engine [KYUUBI #4412][FOLLOWUP] Align the server/engine session handle for flink/hive/trino/jdbc engines 2023-02-27 20:57:19 +08:00
kyuubi-jdbc-engine [KYUUBI #4412][FOLLOWUP] Align the server/engine session handle for flink/hive/trino/jdbc engines 2023-02-27 20:57:19 +08:00
kyuubi-spark-sql-engine [KYUUBI #4662] [ARROW] Arrow serialization should not introduce extra shuffle for outermost limit 2023-04-10 09:43:30 +08:00
kyuubi-trino-engine [KYUUBI #4522] use:catalog should execute before than use:database 2023-04-04 10:56:43 +08:00