[rabbitmq-discuss] Hung Server Upgrading from 3.1.1 to 3.1.5 in a cluster

Chris stuff at moesel.net
Fri Aug 16 19:49:11 BST 2013


FYI: I have not been able to reproduce this problem since it happened.
 Perhaps my servers were somehow in a bad state to begin with.  So... I
guess you can consider this as low-priority, for your informational
purposes only. ;-)

-Chris


On Thu, Aug 15, 2013 at 4:09 PM, Chris <stuff at moesel.net> wrote:

> Hello,
>
> I had a running cluster of two RabbitMQ 3.1.1 servers on Redhat 6.2.  I
> left both running and then attempted to upgrade one (via yum).  After the
> upgrade, rabbitmqctl reported the cluster_status was good, but none of my
> consumers seemed to be working.
>
> I then attempted to upgrade the other, hoping that would fix things, but
> the upgrade just hung.  After killing the upgrade (ctl-c) I noted that I
> couldn't stop rabbitmq-server anymore (not via service script or
> rabbitmqctl).  I had to kill it manually.  After killing it, I re-ran the
> upgrade and all was well.
>
> Looking in the logs, I then saw a BUNCH of errors with timestamps
> corresponding to when I upgraded the first server.  It seems that didn't go
> cleanly on the remaining 3.1.1 node and might be responsible for all the
> trouble.  Did I just get unlucky?
>
> Here's the SASL log:
>
> =CRASH REPORT==== 15-Aug-2013::14:27:49 ===
>   crasher:
>     initial call: gen:init_it/6
>     pid: <0.271.0>
>     registered_name: []
>     exception exit: {{badmatch,{error,not_found}},
>                      [{rabbit_mirror_queue_master,stop_all_slaves,2,
>                           [{file,"src/rabbit_mirror_queue_master.erl"},
>                            {line,179}]},
>                       {rabbit_mirror_queue_master,delete_and_terminate,2,
>                           [{file,"src/rabbit_mirror_queue_master.erl"},
>                            {line,175}]},
>                       {rabbit_amqqueue_process,'-terminate/2-fun-3-',5,
>                           [{file,"src/rabbit_amqqueue_process.erl"},
>                            {line,162}]},
>                       {rabbit_amqqueue_process,terminate_shutdown,2,
>                           [{file,"src/rabbit_amqqueue_process.erl"},
>                            {line,272}]},
>                       {gen_server2,terminate,3,
>                           [{file,"src/gen_server2.erl"},{line,1031}]},
>                       {proc_lib,wake_up,3,
>                           [{file,"proc_lib.erl"},{line,249}]}]}
>       in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1034)
>     ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.148.0>]
>     messages: []
>     links: [<0.270.0>]
>     dictionary: [{guid,{{3434499189,622214121,884364685,3594937084},1}}]
>     trap_exit: true
>     status: running
>     heap_size: 1598
>     stack_size: 27
>     reductions: 9106
>   neighbours:
>
> =SUPERVISOR REPORT==== 15-Aug-2013::14:27:49 ===
>      Supervisor: {local,
>                                            rabbit_mirror_queue_slave_sup}
>      Context:    child_terminated
>      Reason:     {{badmatch,{error,not_found}},
>                   [{rabbit_mirror_queue_master,stop_all_slaves,2,
>                        [{file,"src/rabbit_mirror_queue_master.erl"},
>                         {line,179}]},
>                    {rabbit_mirror_queue_master,delete_and_terminate,2,
>                        [{file,"src/rabbit_mirror_queue_master.erl"},
>                         {line,175}]},
>                    {rabbit_amqqueue_process,'-terminate/2-fun-3-',5,
>
>  [{file,"src/rabbit_amqqueue_process.erl"},{line,162}]},
>                    {rabbit_amqqueue_process,terminate_shutdown,2,
>
>  [{file,"src/rabbit_amqqueue_process.erl"},{line,272}]},
>                    {gen_server2,terminate,3,
>                        [{file,"src/gen_server2.erl"},{line,1031}]},
>
>  {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>      Offender:   [{pid,<0.271.0>},
>                   {name,rabbit_mirror_queue_slave},
>                   {mfa,
>                       {rabbit_mirror_queue_slave,start_link,
>                           [{amqqueue,
>                                {resource,<<"acs">>,queue,
>
>  <<"replies.4a0e284c-1662-463a-b363-cbb4e9557266">>},
>                                true,false,none,
>                                [{<<"x-expires">>,signedint,600000}],
>                                <7111.3423.0>,[],[],
>                                [{vhost,<<"acs">>},
>                                 {name,<<"ha-acs">>},
>                                 {pattern,<<".*">>},
>                                 {definition,
>                                     [{<<"ha-mode">>,<<"exactly">>},
>                                      {<<"ha-params">>,2}]},
>                                 {priority,0}],
>                                [{<7111.3424.0>,<7111.3423.0>},
>                                 {<7111.8011.82>,<7111.8010.82>},
>                                 {<0.27964.278>,<0.27962.278>}]}]}},
>                   {restart_type,temporary},
>                   {shutdown,4294967295},
>                   {child_type,worker}]
>
> Thanks!
> Chris
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130816/05a1da97/attachment.htm>


More information about the rabbitmq-discuss mailing list