Reducing network latency

Ping latency

A tool I used regularly when I was a Unix System Administrator was ping

ping -f hostname

This sends a "flood" of packets across the network. It only sends 5 per second, but it gives you an idea of the round trip latency of the network.

I have two servers which are close to each other, so I should be able to get their latencies pretty low. ping reports an average latency of 107 μs (microseconds) which seems reasonable to start with.

All these timings are for round trip time (RTT) which is the time to go from a process in one machine to a process in a second machine and back again.

Java to Java latency with new TCP connections

Creating a new connection each time you want to send a request or a message is relatively expensive, however many applications still work this way so it is useful to get an idea of the latencies this incurs.

Table 1. Using NettyEchoServer and Chronicle’s EchoReconnectingClientMain via 1 Gb line
Throughput, 1 client	Typical	99.9%ile
200/s	290 μs	510 μs
500/s	280 μs	692 μs
1000/s	260 μs	670 μs
2000/s	260 μs	1,700 μs
3500/s	300 μs	6,500 μs

At this point, I found I couldn’t run the test for as long as the machine kept running out of resources. I could have given it more, but as you will see, opening connections each time isn’t efficient even with a bit of tuning.

Using a low latency connection.

One way to improve performance is to use a low latency network card such as the Solarflare 8522-PLUS. It is a 10 Gb card designed for low latency

The ping time for this connection was 26 μs (as you will see, this is still pretty slow)

Table 2. Using NettyEchoServer and Chronicle’s EchoReconnectingClientMain over 10 Gb Solarflare
Throughput, 1 client	Typical	99.9%ile
200/s	160 μs	252 μs
500/s	150 μs	330 μs
1000/s	185 μs	360 μs
2000/s	135 μs	410 μs

This is a significant improvement without having to change the software.

Reusing connections

Reusing connections for streaming messages can achieve much higher throughputs and lower latencies.

Table 3. Using NettyEchoServer and Chronicle’s EchoClientMain via 10 Gb line
Throughput, 1 client	Typical	99.9%ile
20,000/s	21 μs	100 μs
30,000/s	23 μs	260 μs
40,000/s	31 μs	1,600 μs
50,000/s	110 μs	1,700 μs
60,000/s	na	na

The Echo Server using Netty was better than Chronicle Network for thousands of connections, but in this test, we have just one connection.

Table 4. Using Chronicle’s EchoServer2Main and EchoClientMain via 10 Gb line
Throughput, 1 client	Typical	99.9%ile
20,000/s	17 μs	71 μs
30,000/s	18 μs	84 μs
40,000/s	21 μs	110 μs
50,000/s	31 μs	205 μs
60,000/s	na	na

Lastly, to minimise latency, we use Solarflare’s onload which enables a userspace TCP driver, bypassing the kernel without having to change our Java application to use Solarflare’s library.

Table 5. Using Chronicle’s EchoServer2Main and EchoClientMain via 10 Gb line with `onload`
Throughput, 1 client	Typical	99.9%ile
20,000/s	5.6 μs	13 μs
30,000/s	5.6 μs	15 μs
40,000/s	5.6 μs	16 μs
50,000/s	5.6 μs	17 μs
60,000/s	5.6 μs	18 μs
80,000/s	5.6 μs	21 μs
100,000/s	5.6 μs	24 μs
120,000/s	5.9 μs	30 μs
150,000/s	8.9 μs	55 μs

What if the services are on the same machine?

I would recommend using a shared memory ring buffer e.g. Aeron or a durable Chronicle Queue. These solutions have lower latencies again and more consistent high percentile timings as well.

Conclusion

To reduce the latency, and increase the throughput, I suggest

Reuse connections
Use a low latency network card
Use kernel bypass/userspace drivers.
Use our networking library designed for lower latencies

For the same servers, using the same version of Java, the throughput can increase from maybe 5,000 messages per second to 150,000 messages per second, while the round trip latency can drop from 300 μs to less than 6 μs.