[rabbitmq-discuss] Huge latency in Linux, compared with Leopard

Sat Sep 27 20:44:51 BST 2008

I've found a very interesting result.

This latency problem is caused by creating a connection and closing it.

If you reuse a connection for next messages, the latency is almost same as
MulticastMain shows, which is hundreds us. But, if you reconnect each time
when you send a publish message, the result shows 40ms only for Linux.

Yes, I still don't know why Linux & Erlang is 10 times slower than Leopard &
Erlang's 4ms when we re-establish a connection before publishing a message.
As above two smart guys suggested, we need to dig into the kernel level. I
don't know all combinations of kernel parameter which affects to this
result. I need to struggle.

Holger Hoffstätte-2 wrote:
> 
> 
> This is just too interesting to resist :)
> 
> Matthias Radestock wrote:
>> Bogon Choi wrote:
>>> I am using RabbitMQ Java library to talk with RabbitMQ Server.
>> 
>> I have just run a test on a one of our Debian Linux machines here - 
>> kernel 2.6.24-1-686, "Intel(R) Xeon(TM) CPU 2.80GHz stepping 09". The 
>> test uses the MulticastMain example that ships with the Java client to 
>> send a 1k message every second and measure the latency:
>> 
>> sh runjava.sh com.rabbitmq.examples.MulticastMain -a -r 1 -s 1024 -i 5
>> 
>> Once the system has settled down I get minimum latencies of around 900 
>> microseconds, and average latencies of about 1050 microseconds.
>> 
>> Do you see the same results on your system when running the same test?
> 
> Basically yes, though with a couple of tricks I have been able to get
> minimum & average latency <300us, see below (best value was 266us!)
> 
>> There is the occasional blip that produces max latencies of around 40ms. 
>> I always thought that was most likely due to the fact that the system is 
>> doing other things - it's my desktop machine - but given that the figure 
>> is the same that you are reporting perhaps that is not the case.
> 
> I thought so too but got the same 40ms blips on both a single-CPU machine
> with other (mostly idle) processes, and my completely idle dual-core
> laptop. Both are running 2.6.26.5, rabbit 1.4 and erlang 12.2.4.
> 
> Some findings:
> 
> - kernel settings matter, but not as much as one would think. My server
> runs at 250 HZ & voluntary preemption, whereas the laptop runs with full
> preemption at 300 HZ - however both exhibit very similar symptoms, and I
> strongly suspect you'd see the same at 1000 HZ or with the RT kernel
> (still need to try that one). Keep in mind that different distributions
> have patched kernels to varying degrees (especially RedHat) and that the
> relatively new CFQ CPU scheduler (new in 2.6.23 IIRC) had a lot of
> performance oddities since its introduction. My understanding from
> following the kernel list most of these should be fixed in the current
> 2.6.26 kernel, however the variance between the average (~1ms) and max.
> latency (40ms) is IMHO just way too big for an occasional mis-schedule so
> something else must be wrong. Besides we all get the same 40ms penalty so
> that is a good sign that it's not the kernel scheduler per se.
> 
> - eliminate any JVM latency/threading oddities. Adjusting some VM flags
> can make sure you don't get surprised by HotSpot dynamically recompiling
> itself, the GC stoppping the world etc. You can actually see the native
> method compiler kick in and the latencies decrease if you watch closely.
> So to avoid any interference by the JVM I used JDK 6 and:
> 
>   -XX:CompileThreshold=10 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> 
> This will (in order) compile methods to native code after 10 invocations,
> use a second thread for collecting the young generation, and not
> block-the-world when doing any major collections. You will see that the
> latencies go down quite a bit after a few messages. It won't fix any
> problems with Rabbit or the Erlang VM, but it reduces latency jitter on
> the client side.
> 
> - I noticed that increasing the rate (-r) to 10 gave me more spikes, and
> -r 100 ran with 40ms max latency all the time. This got *much* better
> without -a (auto-ack) so something with the ack handling seemed to trigger
> the behaviour.
> 
> - the fixed penalty for small packets reminded me of good old Mr. Nagle
> who is not your friend when it comes to latency..and behold! Setting
> TCP_NODELAY in both the Java client's SocketFrameHandler and the Rabbit
> startup script (as documented in inet: {nodelay, Boolean}) did the trick,
> even *with* auto-ack!
> 
> With this setup even my single-CPU box has only a handful of latency blips
>  at -r 100 over a longer period of time, with a much smaller variance than
> before (a very rare max. ~12k us) which might as well be my
> cron/fetchmail/tomcat waking up. On the dual-core laptop the latencies are
> all ~350/750/900 with the very occasional 1500us max. blip.
> 
> I have no idea how exactly the auto-ack works, but I suspect there is just
> some bad interaction with auto-ack, beam's own internal process
> scheduling, the tcp writer and possibly some inet options like e.g.
> {delay_send, Boolean}.
> 
> Not sure if this helps but maybe it will give you some ideas for further
> testing.
> 
> regards
> Holger
> 
> (I really need to build me an -rt kernel.. :)
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 
> 

-- 
View this message in context: http://www.nabble.com/Re%3A-Huge-latency-in-Linux%2C-compared-with-Leopard-tp19693265p19705862.html
Sent from the RabbitMQ mailing list archive at Nabble.com.