A quick heads up on performance of message batching  : I ran MPerf and UPerf and got the results as shown at the bottom of .
The tests were run on a 4 node cluster; with cluster sizes of 6 and 8, I ran 2 processes on the same physical box.MPerf shows that a slightly better perf for 2 and 4 nodes
, but a significantly (10%) better perf when running more than 1 process on the same box (6 and 8 nodes). I think the reason is that under contention, the property of message batching to acquire fewer locks comes in to reduce lock contention.UnicastTestRPC shows exactly the same perf
for the old (no message batching) and the new code (with message batching). The main reason here is that we use synchronous RPCs and one
sender, which doesn't take advantage of message batching at all, as no message bundles are sent across the wire.UPerf shows a significantly better perf for 4 nodes (11%) and 8 nodes (16% better)
. I guess the reason here is that we do make use of message batching as we have multiple sender threads and higher contention than in the previous test.
This is not the end of the line, as I haven't implemented message batching in protocols above
NAKACK2 and UNICAST2: currently, messages are sent up in batches from the transport to NAKACK2 (multicast messages) or UNICAST2 (unicast messages), but from there on, they're sent up individually.
This will get changed in , but because this is a lot of work and will affect many classes, I thought I split the work in two parts.
The first part has been merged with master (3.3) and it would be interesting to get feedback from people trying this out !