Reducing network latency
A look down the rabbit hole of reducing network latency. How latency can be measured and what you can do about it in a Java application.
Ping latency
A tool I used regularly when I was a Unix System Administrator was ping
ping -f hostname
This sends a "flood" of packets across the network. It only sends 5 per second, but it gives you an idea of the round trip latency of the network.
I have two servers which are close to each other, so I should be able to get their latencies pretty low. ping
reports an average latency of 107 μs (microseconds) which seems reasonable to start with.
All these timings are for round trip time (RTT) which is the time to go from a process in one machine to a process in a second machine and back again. |
Java to Java latency with new TCP connections
Creating a new connection each time you want to send a request or a message is relatively expensive, however many applications still work this way so it is useful to get an idea of the latencies this incurs.
Throughput, 1 client |
Typical |
99.9%ile |
200/s |
290 μs |
510 μs |
500/s |
280 μs |
692 μs |
1000/s |
260 μs |
670 μs |
2000/s |
260 μs |
1,700 μs |
3500/s |
300 μs |
6,500 μs |
At this point, I found I couldn’t run the test for as long as the machine kept running out of resources. I could have given it more, but as you will see, opening connections each time isn’t efficient even with a bit of tuning.
Using a low latency connection.
One way to improve performance is to use a low latency network card such as the Solarflare 8522-PLUS. It is a 10 Gb card designed for low latency
The ping time for this connection was 26 μs (as you will see, this is still pretty slow)
Throughput, 1 client |
Typical |
99.9%ile |
200/s |
160 μs |
252 μs |
500/s |
150 μs |
330 μs |
1000/s |
185 μs |
360 μs |
2000/s |
135 μs |
410 μs |
This is a significant improvement without having to change the software.
Reusing connections
Reusing connections for streaming messages can achieve much higher throughputs and lower latencies.
Throughput, 1 client |
Typical |
99.9%ile |
20,000/s |
21 μs |
100 μs |
30,000/s |
23 μs |
260 μs |
40,000/s |
31 μs |
1,600 μs |
50,000/s |
110 μs |
1,700 μs |
60,000/s |
na |
na |
The Echo Server using Netty was better than Chronicle Network for thousands of connections, but in this test, we have just one connection.
Throughput, 1 client |
Typical |
99.9%ile |
20,000/s |
17 μs |
71 μs |
30,000/s |
18 μs |
84 μs |
40,000/s |
21 μs |
110 μs |
50,000/s |
31 μs |
205 μs |
60,000/s |
na |
na |
Lastly, to minimise latency, we use Solarflare’s onload which enables a userspace TCP driver, bypassing the kernel without having to change our Java application to use Solarflare’s library.
Throughput, 1 client |
Typical |
99.9%ile |
20,000/s |
5.6 μs |
13 μs |
30,000/s |
5.6 μs |
15 μs |
40,000/s |
5.6 μs |
16 μs |
50,000/s |
5.6 μs |
17 μs |
60,000/s |
5.6 μs |
18 μs |
80,000/s |
5.6 μs |
21 μs |
100,000/s |
5.6 μs |
24 μs |
120,000/s |
5.9 μs |
30 μs |
150,000/s |
8.9 μs |
55 μs |
What if the services are on the same machine?
I would recommend using a shared memory ring buffer e.g. Aeron or a durable Chronicle Queue. These solutions have lower latencies again and more consistent high percentile timings as well.
Conclusion
To reduce the latency, and increase the throughput, I suggest
-
Reuse connections
-
Use a low latency network card
-
Use kernel bypass/userspace drivers.
-
Use our networking library designed for lower latencies
For the same servers, using the same version of Java, the throughput can increase from maybe 5,000 messages per second to 150,000 messages per second, while the round trip latency can drop from 300 μs to less than 6 μs.