[rabbitmq-discuss] Federated exchange slowdown

Wed Mar 21 15:37:02 GMT 2012

Hi Simone. I'm afraid I still can't replicate this. I was using iptables 
to interrupt connectivity between two brokers on the same machine, but 
apart from that I think I was doing the same thing.

One thing I noticed was that interrupting a connection that did not have 
heartbeating turned on meant the connection did not recover well unless 
manually killed. Do heartbeats help for you?

Also, what happens if you kill the federation connection when your test 
is stuck?

Cheers, Simon

On 20/03/12 23:03, Simone Busoli wrote:
> Hi Simon, I'm confused as well. Please take my previous observation with
> a grain of salt as I was publishing somewhat big messages (100KB) at a
> rate that perhaps couldn't be handled very well by the network. As soon
> as I reduced the size to one tenth everything started going more
> smoothly, for as long as something weird didn't happen to the network.
> Here's the test environment I've set up to reproduce the issue: two
> physical machines connected each to a network switch (all devices are
> 100Mb), with the two switches connected to each other. Of course B1 and
> B2 live on the two machines, respectively. A bunch of consumers on each
> side. I'm faking a network latency of a few hundreds of milliseconds via
> software, and publishing 20 msg/s of 10 kB each on one side of a
> bidirectional federation exchange seems to work fine.
> If I try disconnecting the network cable which connects the two switched
> and plug it back a few seconds later implies that the federated
> exchanges never catch up with the messages queued up so far, in
> particular the outgoing queue on the broker acting as the downstream in
> this scenario (the one whose messages would then be discarded by the
> plugin once back to the publisher side) seems to not be emptied by
> anyone anymore.
>
> One other thing I noticed during the failure in the production
> environment is that if I deleted this automatically created queue (i.e.
> the queue B2 -> B1, with B1 being the originator of the messages),
> affected the queue B1 -> B2, which from a slow delivery rate stopped
> completely. Now, as far as my understanding goes each direction of an
> exchange federated between two brokers in both directions should be
> independent of the other, but I experienced exactly this, and given that
> a unidirectionally federated exchange under the same conditions
> described above works just fine I am wondering whether connecting two
> exchanges in this way implies some weird behavior in which each side
> influences the other in a sort of cascading behavior which leads to a
> deadlock.
>
> Although unlikely to be the cause of this, I'm wondering if using
> prefetch-count, which I'm using, could lead to this behavior. For
> example after the network bounce mentioned above the prefetch count was
> reached on both sides, so no more deliveries would be done until acks
> arrived from the other side. Might it happen that this could imply a
> deadlock in which each side is waiting for the other to send
> acknowledges before sending anymore messages?
>
> On Tue, Mar 20, 2012 at 13:21, Simon MacMullen <simon at rabbitmq.com
> <mailto:simon at rabbitmq.com>> wrote:
>
>     On 19/03/12 14:00, Busoli, Simone wrote:
>
>         Hi Simon, I think I've mostly tracked down the issue to the
>         symmetric
>         setup of the federated exchanges between the two brokers. I noticed
>         that whenever I start publishing messages to an exchange configured
>         that way the network starts behaving in surprisingly ways. For
>         instance, I can no longer get two machines connected directly by two
>         network switches to ping each other. Stop publishing messages and
>         everything goes back to a normal state. A federated exchange which
>         goes in one direction only as well as a shovel instead behave just
>         fine.
>
>
>     I am very confused by this.
>
>     Really, nothing the federation plugin does should be able to stop
>     ICMP pings from working, it's just a TCP connection after all.
>
>     The only thing I can think of is that somehow federation is going
>     mad and flooding the link with traffic - but I'm sure you would have
>     noticed that. And pings should still get through anyway.
>
>     Does wireshark / tcpdump / etc show anything unusual?
>
>     Cheers, Simon
>
>
>     --
>     Simon MacMullen
>     RabbitMQ, VMware
>     _________________________________________________
>     rabbitmq-discuss mailing list
>     rabbitmq-discuss at lists.__rabbitmq.com
>     <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>     https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
>     <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
>

-- 
Simon MacMullen
RabbitMQ, VMware