[rabbitmq-discuss] Federated exchange slowdown
Simon MacMullen
simon at rabbitmq.com
Mon Mar 19 11:25:50 GMT 2012
Hmm. Well, this *should* be very simple - the federation plugin is a bit
tricky in terms of how it gets everything set up, but once the link is
up it really just consumes and republishes the messages.
Assuming you have an even slightly fast network link I can't imagine why
either federation queue would end up with messages backing up.
So assuming you are not network bound we could start by figuring out
what is stopping the messages getting delivered immediately. If the
prefetch count is 100, does that mean the queue is showing 100 messages
unacked (i.e. the delay is happening because the federated exchange is
not consuming fast enough), or fewer (i.e. the broker is just not
delivering the messages for some reason)?
Cheers, Simon
On 18/03/12 11:42, Simone Busoli wrote:
> Following up on this issue, the problem has worsened to a great extent.
> Here's a bit more information about the setup.
> Two brokers B1 and B2, running version 2.7.1 and Erlang R14B04 on
> Windows, about 15000 km away from each other. Both machines running the
> brokers are physical machines with several cores, lots of RAM, fast
> disks, and reported no problems in disk access, CPU or memory consumption.
>
> We have two federated durable topic exchanges X1 and X2, the upstream
> connection to the other broker is configured with an hearbeat of a few
> seconds and the prefetch count is 100. X1 is bidirectionally federated
> with max_hops = 1 to prevent messages from going in loops (which
> nonetheless suffer the additional roundtrip before being discarded, a
> problem I reported a while ago), X2 unidirectionally federated from B1
> to B2. Messages are all published on B1, all those published on X1 are
> persistent, while those published on X2 aren't. Messages are published
> on X1 with an average rate of 3 msg/sec, and on X2 with an average rate
> of 10 msg/sec.
>
> What we experienced is that messages published on X1 queue up on the
> outgoing queue on B1, and even worse on the outgoing queue on B2
> (although we don't care about them as the would be discarded once
> delivered thanks to the max_hops setting). Looking at the Web UI it
> appears that the acknowledges for those messages come in more slowly
> than messages are published, but I really can't understand why this
> might be the case, given the very low publish rate. The visible effect
> is a delay of even several minutes between when a message is published
> on B1 and received on B2, but again, only for the low-rate messages
> published on X1 (perhaps even worse for the other direction if the
> messages were not thrown away by the federation because of the max_hops
> setting, as the outgoing queue on B2 has several thousands of messages
> versus the few hundreds of that on B1). In other words, the only
> difference between the two federated exchanges appears to be the
> bidirectionality of the first versus the unidirectionality of the second
> and the fact that those published on the first are persistent.
> The logs of the two brokers contain apparently no relevant information,
> and the network link between the two has always appeared to be in a good
> state.
--
Simon MacMullen
RabbitMQ, VMware
More information about the rabbitmq-discuss
mailing list