[rabbitmq-discuss] Clustered nodes failure

Simon MacMullen simon at rabbitmq.com
Wed Sep 19 13:08:31 BST 2012


Hi Ian.

We've fixed quite a lot of bugs in mirrored queues since 2.7.1. So I 
would have to suggest an upgrade to 2.8.6 first of all.

Cheers, Simon

On 19/09/12 12:45, Ian wrote:
> Hi all,
>
> I wonder if anyone can help diagnose problems we've been having with our
> 2-node clustered rabbit which sporadically seizes up completely. None of
> the applications can get through to Rabbit though it is still up and
> running. CPU and RAM spike up to 100%. The Management UI cannot be
> reached and we end up having to restart the nodes to get service back.
> Sometimes it does not come back gracefully requiring reboot.
>
> Some stats:
>
>     * Both nodes are 4 Core 8GB RAM CentOS 6.2 virtual machines, running
>       on VMWare ESXi 4.1 host. We are running RabbitMQ version 2.7.1 on
>       Erlang R14B04.
>     * Looking at our metrics right now I see around:
>           o 1000 queues
>           o 4000 channels
>           o 8000 bindings
>           o 16 exchanges
>     * Memory usage, erlang processes, file descriptors, socket
>       descriptors are generally low and healthy
>
> Analysing errors in the rabbit logs from a recent failure reveals:
>
>     * Before the failure we have a bunch of background errors which may
>       be the fault of our applications like "no binding X between
>       exchange Y in vhost '/' and queue Z in vhost '/'"
>     * As we ramp up to the failure we see
>           o Two errors like this:
>                 + “** Generic server <0.16813.1677> terminating ** Last
>                   message in was {'$gen_cast',
>                   {run_backing_queue,rabbit_mirror_queue_master,
>                   #Fun<rabbit_mirror_queue_master.4.85178772>}} ** When
>                   Server state == {lim,0,undefined,false,[],0} ** Reason
>                   for termination == ** {function_clause,
>                   [{rabbit_limiter,handle_cast,
>                   [{run_backing_queue,rabbit_mirror_queue_master,
>                   #Fun<rabbit_mirror_queue_master.4.85178772>},
>                   {lim,0,undefined,false,[],0}]},
>                   {gen_server2,handle_msg,2},
>                   {proc_lib,init_p_do_apply,3}]} “
>           o A handful like this:
>                 + “connection <0.14270.7735>, channel 38 - error:
>                   {amqp_error,command_invalid,"second 'channel.open'
>                   seen",'channel.open'} “
>           o A couple of these:
>                 + “connection <0.158.6322>, channel 135 - error:
>                   {amqp_error,not_found, "no queue
>                   'InRunning.WebClient.SessionId[l0mpn3egx5n0yj0lbs1hcehj]'
>                   in vhost '/'", 'basic.get'} “
>           o And then all these:
>                 + exception on TCP connection <0.14270.7735> from
>                   WWW.XXX.YYY.ZZZ:59106 {inet_error,enotconn}
>                 + exception on TCP connection <0.14577.1677> from
>                   WWW.XXX.YYY.ZZZ:53163 {inet_error,enotconn}
>                 + exception on TCP connection <0.1520.5487> from
>                   WWW.XXX.YYY.ZZZ:53435 {timeout,running}
>                 + exception on TCP connection <0.158.6322> from
>                   WWW.XXX.YYY.ZZZ:63187
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.17097.1918> from
>                   WWW.XXX.YYY.ZZZ:55161
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.18340.7733> from
>                   WWW.XXX.YYY.ZZZ:52868 {inet_error,enotconn}
>                 + exception on TCP connection <0.24514.6782> from
>                   WWW.XXX.YYY.ZZZ:64362 {timeout,blocking}
>                 + exception on TCP connection <0.24518.6782> from
>                   WWW.XXX.YYY.ZZZ:61252 {timeout,blocking}
>                 + exception on TCP connection <0.24524.6782> from
>                   WWW.XXX.YYY.ZZZ:55845 {timeout,blocking}
>                 + exception on TCP connection <0.24528.6782> from
>                   WWW.XXX.YYY.ZZZ:53434 {timeout,blocking}
>                 + exception on TCP connection <0.24532.6782> from
>                   WWW.XXX.YYY.ZZZ:54398 {timeout,blocking}
>                 + exception on TCP connection <0.24536.6782> from
>                   WWW.XXX.YYY.ZZZ:58878 {timeout,blocking}
>                 + exception on TCP connection <0.24552.6782> from
>                   WWW.XXX.YYY.ZZZ:63155 {timeout,blocking}
>                 + exception on TCP connection <0.2577.2793> from
>                   WWW.XXX.YYY.ZZZ:52752
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.26105.2580> from
>                   WWW.XXX.YYY.ZZZ:50364
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.27505.6740> from
>                   WWW.XXX.YYY.ZZZ:56170
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.27741.2921> from
>                   WWW.XXX.YYY.ZZZ:54600
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.28602.6323> from
>                   WWW.XXX.YYY.ZZZ:56863
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.30059.3135> from
>                   WWW.XXX.YYY.ZZZ:57078 {writer,send_failed,{error,closed}}
>                 + exception on TCP connection <0.5634.2393> from
>                   WWW.XXX.YYY.ZZZ:53807
>                   {writer,send_failed,{error,enotconn}}
>                 + exception on TCP connection <0.6691.6783> from
>                   WWW.XXX.YYY.ZZZ:64363 {timeout,blocking}
>
> Can anyone help?
>
> Thanks,
>
> Ian
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


-- 
Simon MacMullen
RabbitMQ, VMware


More information about the rabbitmq-discuss mailing list