[rabbitmq-discuss] Federation/Connection issues

Ganann, Kale KGanann at kroll.com
Tue Mar 25 00:11:42 GMT 2014


Hi all,

Our Rabbit had a bit of an accident this weekend - the 3.2.1 cluster accidentally got rebooted all at once.  When MNESIA went down on the second node, it crashed so hard it dropped the disk.  That said, we had the JSON backups on the other two boxes, neither of which wanted to start without the master, so I decided rather than fight them I'd just pull the backup and rebuild the cluster.  When I rebuilt the cluster it came back up on 3.2.4, the latest build, I restored and everything seemed fine.  At first.

By the next morning the master node of the cluster had 1,020,000 erlang processes running (approximately - the actual number was slightly higher) and was deep in the red.  Federation wasn't working, and so we made the call to cleanly shutdown the cluster and bring it back.  Attempting to run rabbitmqctl on the node with the erlang processes resulted in erl_crash dumps, and it had to be rebooted.  Another node took over the master spot, and that was when we noticed the issue.

Federation is spinning up internal connections left, right and center.  It had spawn 56,000 connection on the node that failed overnight.  We tried to rebuild the cluster, but as soon as we loaded the backup it was the same thing - connections started spawning, even when it was isolated from the network and any true outside connections.  We tried rebuilding it back to 3.2.1, and then 3.2.0, we tried old backups from November and October of last year, we tried restoring backups from our Dev and Stage environments.  All spawn these connection issues.

Finally, we disabled the federation plugin and restarted Rabbit.  The connections stopped.  We then tried reenabling the plugin and restarting again.  In 3.2.1, the connections skyrocket again.  In 3.2.4, they climb, but much, much more slowly - a few dozen connections a minute as opposed to hundreds or thousands.

I've attached the output of rabbitmqctl report from one of our nodes while the processes were climbing.  We've currently disabled federation in prod, but we're looking for a better solution.  Ideas?

Thank you very much,
Kale Ganann
This communication contains information that is confidential,
proprietary in nature, and may also be attorney-client privileged
and/or work product privileged. It is for the exclusive use of the
intended recipient(s). If you are not the intended recipient(s) or
the person responsible for delivering it to the intended
recipient(s), please note that any form of dissemination,
distribution or copying of this communication is strictly
prohibited and may be unlawful. If you have received this
communication in error, please immediately notify the sender by replying
to this message and delete this email immediately. Thank you for your cooperation. 

Please be advised that neither Altegrity, its affiliates, its employees
or agents accept liability for any errors, omissions or damages
caused by delays of receipt or by any virus infection in this
message or its attachments, or which may otherwise arise as a
result of this e-mail transmission.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140325/b41d5fcc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: report.zip
Type: application/x-zip-compressed
Size: 783348 bytes
Desc: report.zip
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140325/b41d5fcc/attachment-0001.bin>


More information about the rabbitmq-discuss mailing list