### What changes were proposed in this pull request?
while SortBasedWriter has less memory footprint than HashBasedWriter, it suffers from performance issue when we have many partitions and the write buffer is filled with small chunks of data quickly
for example, if sort buffer size is 32K, you have 4 partitions and 128K data in total, the data distribution is like partition A, B, C, D, each time it comes with 8K per partition.... in this case, you need to compress and send small 8K chunk 4 times per partition , the cost would become very high. If you use hashbasedwriter, it doesn't have this problem since the push only happens when the per-partition buffer is full. Of course , larger sort buffer size can mitigate the issue, but tuning sort buffer size per job is a tedious work
this PR introduces a new feature that we measure total size of pushed bytes and pushed count as well as the "should-pushed" bytes and counts (should-push means that , the data we pushed is larger than CLIENT_PUSH_BUFFER_MAX_SIZE (in another word, we will trigger a push even with hashbasedwriter in this case))
when actualPushedBytes/actualPushedCounts > (1 + Threshold) * (ShouldPushBytes/ShouldPushCounts), we will enlarge the sort buffer size by 1X to try to buffer more data before pushing (the max size of sortBuffer would be capped at # of partitions * CLIENT_PUSH_BUFFER_MAX_SIZE)
### Why are the changes needed?
to reduce perf cost in sortbased writer
### Does this PR introduce _any_ user-facing change?
no, but have 2 extra configurations
### How was this patch tested?
in prod of our company and also unit test
Closes#2358 from CodingCat/adaptive_memory_threshold.
Authored-by: CodingCat <zhunansjtu@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>