[rabbitmq-discuss] Rabbit startup command is hanging
Jason McIntosh
mcintoshj at gmail.com
Fri Apr 11 15:20:29 BST 2014
SO a final email on this. I ended up having to kill all the processes on
all nodes in the cluster, then starting them back up in order to recover.
At that point, the node that wouldn't rejoin the cluster came online and
started syncing messages and responding fine. I'm guessing I had a
deadlock someplace though I'm not totally sure where it would be. I'll
keep an eye on this and see what else I can discover. *SIGH* I really need
to learn to debug and work with erlang better,
Thanks all,
Jason
On Thu, Apr 10, 2014 at 4:22 PM, Jason McIntosh <mcintoshj at gmail.com> wrote:
> SO now the fun part. I decided to try and rebuild the middle node (I have
> boxes 10, 11 and 12). However, I can't get the middle node to reconnect to
> the cluster. Removing it's mnesia directory allowed it to start, but it
> can't rejoin the cluster. SO I tried removing the node from the cluster,
> e.g.:
>
> rabbitmqctl -n cluster at rabbitmqm10 forget_cluster_node cluster at rabbitmqm11
>
> But the above never responds - it's just sitting there hanging.
>
> rabbitmqctl -n cluster at rabbitmqm11 status FROM the other nodes all works
> fine. I'm about at a loss as to how the heck to repair things. I can't
> remove the node from the cluster, I can't start it with the mnesia
> directory in it's current state, and removing the mnesia directory and
> trying to add it back in is failing - it fails with "....done
> (already_member).". Trying to do rabbitmqctl update_cluster_nodes
> cluster at rabbitmqm10 is sitting there doing nothing and not responding
> either.
>
>
> I'm starting to really worry I'm going to have to completely rebuild my
> cluster...
> Jason
>
>
>
> On Thu, Apr 10, 2014 at 2:55 PM, Jason McIntosh <mcintoshj at gmail.com>wrote:
>
>> Not sure what's going on here. Just ugpraded my cluster from 3.2.3 to
>> 3.2.4 (including a restart of the machine). On startup, two of my initial
>> nodes started fine, but when the third node in the cluster started, the
>> "/etc/init.d/rabbitmq-server start" just sits at "Starting rabbitmq-server:
>> " without ever finishing. Doing a rabbitmqctl status shows:
>> Status of node cluster at rabbitmqm11p ...
>> [{pid,62505},
>> {running_applications,[{os_mon,"CPO CXC 138 46","2.2.14"},
>> {inets,"INETS CXC 138 49","5.9.8"},
>> {mnesia,"MNESIA CXC 138 12","4.11"},
>> {amqp_client,"RabbitMQ AMQP Client","3.2.4"},
>> {xmerl,"XML parser","1.3.6"},
>> {eldap,"Ldap api","1.0.2"},
>> {sasl,"SASL CXC 138 11","2.3.4"},
>> {stdlib,"ERTS CXC 138 10","1.19.4"},
>> {kernel,"ERTS CXC 138 10","2.16.4"}]},
>> {os,{unix,linux}},
>> {erlang_version,"Erlang R16B03-1 (erts-5.10.4) [source] [64-bit]
>> [smp:24:24] [async-threads:30] [hipe] [kernel-poll:true]\n"},
>> {memory,[{total,48504352},
>> {connection_procs,2808},
>> {queue_procs,0},
>> {plugins,0},
>> {other_proc,16290632},
>> {mnesia,1783536},
>> {mgmt_db,0},
>> {msg_index,0},
>> {other_ets,1120896},
>> {binary,725448},
>> {code,19691642},
>> {atom,703377},
>> {other_system,8186013}]},
>> {file_descriptors,[{total_limit,12188},
>> {total_used,0},
>> {sockets_limit,10967},
>> {sockets_used,0}]},
>> {processes,[{limit,1048576},{used,117}]},
>> {run_queue,0},
>> {uptime,83}]
>> ...done.
>>
>>
>> In the web management interface, I see this:
>> Node statistics not available
>> Memory details
>>
>> Connections 2.7kB Queues 0B Plugins 0B Other process memory 16MB
>> Mnesia 1.7MB Message store index 0B Management database 0B Other ETS
>> tables 1.1MB Binaries 708kB Code 19MB Atoms 687kB Other system 7.8MB
>>
>>
>> SO rabbit appears to have sort of started, but certain things are not
>> started (e.g. plugins). Plugins list is:
>> [e] amqp_client 3.2.4
>> [ ] cowboy 0.5.0-rmq3.2.4-git4b93c2d
>> [ ] eldap 3.2.4-gite309de4
>> [e] mochiweb 2.7.0-rmq3.2.4-git680dba8
>> [ ] rabbitmq_amqp1_0 3.2.4
>> [E] rabbitmq_auth_backend_ldap 3.2.4
>> [ ] rabbitmq_auth_mechanism_ssl 3.2.4
>> [E] rabbitmq_consistent_hash_exchange 3.2.4
>> [E] rabbitmq_federation 3.2.4
>> [E] rabbitmq_federation_management 3.2.4
>> [ ] rabbitmq_jsonrpc 3.2.4
>> [ ] rabbitmq_jsonrpc_channel 3.2.4
>> [ ] rabbitmq_jsonrpc_channel_examples 3.2.4
>> [E] rabbitmq_management 3.2.4
>> [E] rabbitmq_management_agent 3.2.4
>> [E] rabbitmq_management_visualiser 3.2.4
>> [ ] rabbitmq_mqtt 3.2.4
>> [E] rabbitmq_shovel 3.2.4
>> [E] rabbitmq_shovel_management 3.2.4
>> [ ] rabbitmq_stomp 3.2.4
>> [ ] rabbitmq_tracing 3.2.4
>> [e] rabbitmq_web_dispatch 3.2.4
>> [ ] rabbitmq_web_stomp 3.2.4
>> [ ] rabbitmq_web_stomp_examples 3.2.4
>> [ ] rfc4627_jsonrpc 3.2.4-git5e67120
>> [ ] sockjs 0.3.4-rmq3.2.4-git3132eb9
>> [e] webmachine 1.10.3-rmq3.2.4-gite9359c7
>>
>>
>> Any suggestions on next steps on debugging this? Or what I can do to get
>> this back up and in a "healthy" state?
>>
>> Thanks!
>> Jason
>>
>>
>>
>>
>> --
>> Jason McIntosh
>> https://github.com/jasonmcintosh/
>> 573-424-7612
>>
>
>
>
> --
> Jason McIntosh
> https://github.com/jasonmcintosh/
> 573-424-7612
>
--
Jason McIntosh
https://github.com/jasonmcintosh/
573-424-7612
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140411/1ea41ba8/attachment.html>
More information about the rabbitmq-discuss
mailing list