[rabbitmq-discuss] Node crash, then cluster collapse
carl.hoerberg at gmail.com
Thu Jun 6 16:28:11 BST 2013
Alright, thanks, I'll try to contact you on IRC next time this happens. I updated the gist with some logs from another node, the third node produced *a lot*, so those I have to email you.
On Thursday 6 June 2013 at 18:07, Tim Watson-5 [via RabbitMQ] wrote:
> On 5 Jun 2013, at 13:56, carlhoerberg <[hidden email] (/user/SendEmail.jtp?type=node&node=27221&i=0)> wrote:
> > On a three node cluster, one ec2 machine reboots unexpectedly, and when it
> > starts up again RabbitMQ fails to start. I've put all logs here:
> > https://gist.github.com/carlhoerberg/ff6c6bd4f7639bf4b2f5
> That seems to contain only the logs from one node, what about the others?
> > When the troubled node is restarted manually again it's unable to join,
> > stopping at "adding mirrors", staying there forever.
> > The other nodes now start to behave weird too, new queues can't be declared,
> > but existing queues seems to continue deliver messages. They also can't
> > respond to "rabbitmqctl status", or /api/overview. I'm forced to stop them
> > with "kill -9". Only when all nodes are stopped the cluster can be brought
> > up again normally.
> If you kill -9 the nodes, it's a bit tricky to get live info for diagnosis, assuming there's nothing in the logs. If the logs are available, please post them. Next time this happens, jump on irc (the #rabbitmq channel on freenode) and we can try a few things to diagnose what's going on. If you can arrange for me to have ssh access to these nodes whilst the symptoms are present, I'll be more likely to solve the issue quickly - we might be able to sign some kind of privacy agreement if necessary.
> Also please post your full setup whenever possible, detailing which plugins you're using (if any) and what kind of ha setup you're using.
> rabbitmq-discuss mailing list
> [hidden email] (/user/SendEmail.jtp?type=node&node=27221&i=1)
> If you reply to this email, your message will be added to the discussion below: http://rabbitmq.1065348.n5.nabble.com/Node-crash-then-cluster-collapse-tp27206p27221.html
> To unsubscribe from Node crash, then cluster collapse, click here (http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=27206&code=Y2FybC5ob2VyYmVyZ0BnbWFpbC5jb218MjcyMDZ8LTEyNDcxMDc4NjM=).
> NAML (http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml)
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Node-crash-then-cluster-collapse-tp27206p27239.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rabbitmq-discuss