[rabbitmq-discuss] Hung Server Upgrading from 3.1.1 to 3.1.5 in a cluster

Simon MacMullen simon at rabbitmq.com
Mon Aug 19 16:14:28 BST 2013


Thanks. BTW, we have seen that 
rabbit_mirror_queue_master:stop_all_slaves/2 stacktrace in recent soak 
testing of HA code - but not since fixing various other bugs in that 
area. So I think there was a real bug there (but one that was hard to 
hit) but it's now fixed. Hopefully.

Cheers, Simon

On 16/08/13 19:49, Chris wrote:
> FYI: I have not been able to reproduce this problem since it happened.
>   Perhaps my servers were somehow in a bad state to begin with.  So... I
> guess you can consider this as low-priority, for your informational
> purposes only. ;-)
>
> -Chris
>
>
> On Thu, Aug 15, 2013 at 4:09 PM, Chris <stuff at moesel.net
> <mailto:stuff at moesel.net>> wrote:
>
>     Hello,
>
>     I had a running cluster of two RabbitMQ 3.1.1 servers on Redhat 6.2.
>       I left both running and then attempted to upgrade one (via yum).
>       After the upgrade, rabbitmqctl reported the cluster_status was
>     good, but none of my consumers seemed to be working.
>
>     I then attempted to upgrade the other, hoping that would fix things,
>     but the upgrade just hung.  After killing the upgrade (ctl-c) I
>     noted that I couldn't stop rabbitmq-server anymore (not via service
>     script or rabbitmqctl).  I had to kill it manually.  After killing
>     it, I re-ran the upgrade and all was well.
>
>     Looking in the logs, I then saw a BUNCH of errors with timestamps
>     corresponding to when I upgraded the first server.  It seems that
>     didn't go cleanly on the remaining 3.1.1 node and might be
>     responsible for all the trouble.  Did I just get unlucky?
>
>     Here's the SASL log:
>
>     =CRASH REPORT==== 15-Aug-2013::14:27:49 ===
>        crasher:
>          initial call: gen:init_it/6
>          pid: <0.271.0>
>          registered_name: []
>          exception exit: {{badmatch,{error,not_found}},
>                           [{rabbit_mirror_queue_master,stop_all_slaves,2,
>                                [{file,"src/rabbit_mirror_queue_master.erl"},
>                                 {line,179}]},
>
>     {rabbit_mirror_queue_master,delete_and_terminate,2,
>                                [{file,"src/rabbit_mirror_queue_master.erl"},
>                                 {line,175}]},
>                            {rabbit_amqqueue_process,'-terminate/2-fun-3-',5,
>                                [{file,"src/rabbit_amqqueue_process.erl"},
>                                 {line,162}]},
>                            {rabbit_amqqueue_process,terminate_shutdown,2,
>                                [{file,"src/rabbit_amqqueue_process.erl"},
>                                 {line,272}]},
>                            {gen_server2,terminate,3,
>                                [{file,"src/gen_server2.erl"},{line,1031}]},
>                            {proc_lib,wake_up,3,
>                                [{file,"proc_lib.erl"},{line,249}]}]}
>            in function  gen_server2:terminate/3 (src/gen_server2.erl,
>     line 1034)
>          ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.148.0>]
>          messages: []
>          links: [<0.270.0>]
>          dictionary: [{guid,{{3434499189
>     <tel:3434499189>,622214121,884364685,3594937084},1}}]
>          trap_exit: true
>          status: running
>          heap_size: 1598
>          stack_size: 27
>          reductions: 9106
>        neighbours:
>
>     =SUPERVISOR REPORT==== 15-Aug-2013::14:27:49 ===
>           Supervisor: {local,
>
>       rabbit_mirror_queue_slave_sup}
>           Context:    child_terminated
>           Reason:     {{badmatch,{error,not_found}},
>                        [{rabbit_mirror_queue_master,stop_all_slaves,2,
>                             [{file,"src/rabbit_mirror_queue_master.erl"},
>                              {line,179}]},
>                         {rabbit_mirror_queue_master,delete_and_terminate,2,
>                             [{file,"src/rabbit_mirror_queue_master.erl"},
>                              {line,175}]},
>                         {rabbit_amqqueue_process,'-terminate/2-fun-3-',5,
>
>       [{file,"src/rabbit_amqqueue_process.erl"},{line,162}]},
>                         {rabbit_amqqueue_process,terminate_shutdown,2,
>
>       [{file,"src/rabbit_amqqueue_process.erl"},{line,272}]},
>                         {gen_server2,terminate,3,
>                             [{file,"src/gen_server2.erl"},{line,1031}]},
>
>       {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>           Offender:   [{pid,<0.271.0>},
>                        {name,rabbit_mirror_queue_slave},
>                        {mfa,
>                            {rabbit_mirror_queue_slave,start_link,
>                                [{amqqueue,
>                                     {resource,<<"acs">>,queue,
>
>       <<"replies.4a0e284c-1662-463a-b363-cbb4e9557266">>},
>                                     true,false,none,
>                                     [{<<"x-expires">>,signedint,600000}],
>                                     <7111.3423.0>,[],[],
>                                     [{vhost,<<"acs">>},
>                                      {name,<<"ha-acs">>},
>                                      {pattern,<<".*">>},
>                                      {definition,
>                                          [{<<"ha-mode">>,<<"exactly">>},
>                                           {<<"ha-params">>,2}]},
>                                      {priority,0}],
>                                     [{<7111.3424.0>,<7111.3423.0>},
>                                      {<7111.8011.82>,<7111.8010.82>},
>                                      {<0.27964.278>,<0.27962.278>}]}]}},
>                        {restart_type,temporary},
>                        {shutdown,4294967295},
>                        {child_type,worker}]
>
>     Thanks!
>     Chris
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>


-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list