[rabbitmq-discuss] Hung Server Upgrading from 3.1.1 to 3.1.5 in a cluster

Chris stuff at moesel.net
Thu Aug 15 21:09:46 BST 2013


Hello,

I had a running cluster of two RabbitMQ 3.1.1 servers on Redhat 6.2.  I
left both running and then attempted to upgrade one (via yum).  After the
upgrade, rabbitmqctl reported the cluster_status was good, but none of my
consumers seemed to be working.

I then attempted to upgrade the other, hoping that would fix things, but
the upgrade just hung.  After killing the upgrade (ctl-c) I noted that I
couldn't stop rabbitmq-server anymore (not via service script or
rabbitmqctl).  I had to kill it manually.  After killing it, I re-ran the
upgrade and all was well.

Looking in the logs, I then saw a BUNCH of errors with timestamps
corresponding to when I upgraded the first server.  It seems that didn't go
cleanly on the remaining 3.1.1 node and might be responsible for all the
trouble.  Did I just get unlucky?

Here's the SASL log:

=CRASH REPORT==== 15-Aug-2013::14:27:49 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.271.0>
    registered_name: []
    exception exit: {{badmatch,{error,not_found}},
                     [{rabbit_mirror_queue_master,stop_all_slaves,2,
                          [{file,"src/rabbit_mirror_queue_master.erl"},
                           {line,179}]},
                      {rabbit_mirror_queue_master,delete_and_terminate,2,
                          [{file,"src/rabbit_mirror_queue_master.erl"},
                           {line,175}]},
                      {rabbit_amqqueue_process,'-terminate/2-fun-3-',5,
                          [{file,"src/rabbit_amqqueue_process.erl"},
                           {line,162}]},
                      {rabbit_amqqueue_process,terminate_shutdown,2,
                          [{file,"src/rabbit_amqqueue_process.erl"},
                           {line,272}]},
                      {gen_server2,terminate,3,
                          [{file,"src/gen_server2.erl"},{line,1031}]},
                      {proc_lib,wake_up,3,
                          [{file,"proc_lib.erl"},{line,249}]}]}
      in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1034)
    ancestors: [rabbit_mirror_queue_slave_sup,rabbit_sup,<0.148.0>]
    messages: []
    links: [<0.270.0>]
    dictionary: [{guid,{{3434499189,622214121,884364685,3594937084},1}}]
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 9106
  neighbours:

=SUPERVISOR REPORT==== 15-Aug-2013::14:27:49 ===
     Supervisor: {local,
                                           rabbit_mirror_queue_slave_sup}
     Context:    child_terminated
     Reason:     {{badmatch,{error,not_found}},
                  [{rabbit_mirror_queue_master,stop_all_slaves,2,
                       [{file,"src/rabbit_mirror_queue_master.erl"},
                        {line,179}]},
                   {rabbit_mirror_queue_master,delete_and_terminate,2,
                       [{file,"src/rabbit_mirror_queue_master.erl"},
                        {line,175}]},
                   {rabbit_amqqueue_process,'-terminate/2-fun-3-',5,

 [{file,"src/rabbit_amqqueue_process.erl"},{line,162}]},
                   {rabbit_amqqueue_process,terminate_shutdown,2,

 [{file,"src/rabbit_amqqueue_process.erl"},{line,272}]},
                   {gen_server2,terminate,3,
                       [{file,"src/gen_server2.erl"},{line,1031}]},
                   {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
     Offender:   [{pid,<0.271.0>},
                  {name,rabbit_mirror_queue_slave},
                  {mfa,
                      {rabbit_mirror_queue_slave,start_link,
                          [{amqqueue,
                               {resource,<<"acs">>,queue,

 <<"replies.4a0e284c-1662-463a-b363-cbb4e9557266">>},
                               true,false,none,
                               [{<<"x-expires">>,signedint,600000}],
                               <7111.3423.0>,[],[],
                               [{vhost,<<"acs">>},
                                {name,<<"ha-acs">>},
                                {pattern,<<".*">>},
                                {definition,
                                    [{<<"ha-mode">>,<<"exactly">>},
                                     {<<"ha-params">>,2}]},
                                {priority,0}],
                               [{<7111.3424.0>,<7111.3423.0>},
                                {<7111.8011.82>,<7111.8010.82>},
                                {<0.27964.278>,<0.27962.278>}]}]}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]

Thanks!
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130815/31e9945d/attachment.htm>


More information about the rabbitmq-discuss mailing list