### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
Before this PR, ```Flusher#takeBuffer``` returns a ```CompositeByteBuf``` which is unpooled and on heap:
```
buffer = Unpooled.compositeBuffer(maxComponents)
```
```
public static CompositeByteBuf compositeBuffer(int maxNumComponents) {
return new CompositeByteBuf(ALLOC, /*direct*/ false, maxNumComponents);
}
```
When consolidation happens, the data will be copied from direct memory to heap memory, causing OOM and
perf degration.
With this PR, in my test cases of shuffling 14G for three 1G/1G workers, I don't see disk buffer larger than direct memory,
nor do I encounter high GC.
### Does this PR introduce _any_ user-facing change?
This patch fixes some OOM issues.
### How was this patch tested?
Passes GA and manual test.
Closes#1709 from waitinfuture/790.
Authored-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>