[rabbitmq-discuss] Shovel stops receiving acks from cluster

Matthias Radestock matthias at rabbitmq.com
Mon Aug 27 14:37:15 BST 2012


Jon,

On 23/08/12 16:37, Jon Bergli Heier wrote:
> Attached the logs as requested, only 3 attempts this time (exact same setup
> as yesterday, restart order was qa-test1, qa-test2, qa-test1).

Thanks. We have found a race condition that triggers the behaviour you 
are seeing when the RabbitMQ server is processing a publish that gets 
routed to a queue which has a slave (re)starting at the same time.

As well as preventing confirms and tx.commits, the bug also results in 
memory growth in the slaves.

Looks like this bug has been around since 2.8.0, though due to it being 
a race condition it is conceivable that changes in more recent releases 
have increased the probability of it occurring.

We are working on a fix; can't think of a workaround in the meantime, 
unfortunately.

Thanks again for supplying all the debug info; this was instrumental in 
us being able to identify the root cause.

Regards,

Matthias.


More information about the rabbitmq-discuss mailing list