<font face="trebuchet ms,sans-serif">Hi Simon, I'm confused as well. Please take my previous observation with a grain of salt as I was publishing somewhat big messages (100KB) at a rate that perhaps couldn't be handled very well by the network. As soon as I reduced the size to one tenth everything started going more smoothly, for as long as something weird didn't happen to the network. Here's the test environment I've set up to reproduce the issue: two physical machines connected each to a network switch (all devices are 100Mb), with the two switches connected to each other. Of course B1 and B2 live on the two machines, respectively. A bunch of consumers on each side. I'm faking a network latency of a few hundreds of milliseconds </font><span style="font-family:'trebuchet ms',sans-serif">via software, and publishing 20 msg/s of 10 kB each on one side of a bidirectional federation exchange seems to work fine. </span><div>
<div><span style="font-family:'trebuchet ms',sans-serif">If I try disconnecting the network cable which connects the two switched and plug it back a few seconds later implies that the federated exchanges never catch up with the messages queued up so far, in particular the outgoing queue on the broker acting as the downstream in this scenario (the one whose messages would then be discarded by the plugin once back to the publisher side) seems to not be emptied by anyone anymore.</span></div>
<div><span style="font-family:'trebuchet ms',sans-serif"><br></span></div><div><span style="font-family:'trebuchet ms',sans-serif">One other thing I noticed during the failure in the production environment is that if I deleted this automatically created queue (i.e. the queue B2 -> B1, with B1 being the originator of the messages), affected the queue B1 -> B2, which from a slow delivery rate stopped completely. Now, as far as my understanding goes each direction of an exchange federated between two brokers in both directions should be independent of the other, but I experienced exactly this, and given that a unidirectionally federated exchange under the same conditions described above works just fine I am wondering whether connecting two exchanges in this way implies some weird behavior in which each side influences the other in a sort of cascading behavior which leads to a deadlock. </span></div>
<div><span style="font-family:'trebuchet ms',sans-serif"><br></span></div><div><span style="font-family:'trebuchet ms',sans-serif">Although unlikely to be the cause of this, I'm wondering if using prefetch-count, which I'm using, could lead to this behavior. For example after the network bounce mentioned above the prefetch count was reached on both sides, so no more deliveries would be done until acks arrived from the other side. Might it happen that this could imply a deadlock in which each side is waiting for the other to send acknowledges before sending anymore messages?</span></div>
<div><br><div class="gmail_quote">On Tue, Mar 20, 2012 at 13:21, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com">simon@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 19/03/12 14:00, Busoli, Simone wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Simon, I think I've mostly tracked down the issue to the symmetric<br>
setup of the federated exchanges between the two brokers. I noticed<br>
that whenever I start publishing messages to an exchange configured<br>
that way the network starts behaving in surprisingly ways. For<br>
instance, I can no longer get two machines connected directly by two<br>
network switches to ping each other. Stop publishing messages and<br>
everything goes back to a normal state. A federated exchange which<br>
goes in one direction only as well as a shovel instead behave just<br>
fine.<br>
</blockquote>
<br></div>
I am very confused by this.<br>
<br>
Really, nothing the federation plugin does should be able to stop ICMP pings from working, it's just a TCP connection after all.<br>
<br>
The only thing I can think of is that somehow federation is going mad and flooding the link with traffic - but I'm sure you would have noticed that. And pings should still get through anyway.<br>
<br>
Does wireshark / tcpdump / etc show anything unusual?<br>
<br>
Cheers, Simon<div class="HOEnZb"><div class="h5"><br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, VMware<br>
______________________________<u></u>_________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.<u></u>rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/<u></u>cgi-bin/mailman/listinfo/<u></u>rabbitmq-discuss</a><br>
</div></div></blockquote></div><br></div></div>