[rabbitmq-discuss] Errors starting RabbitMQ when cluster membership changes

Matt Pietrek mpietrek at skytap.com
Fri Sep 28 20:23:38 BST 2012


In my environment, I have numerous RabbitMQ nodes for testing purposes, and
frequently changing around cluster configuration (e.g. which nodes are part
of a cluster).

Using 2.8.6 and subsequently 2.8.7 on Erlang R15B01, I've observed several
times recently where RabbitMQ has failed to start on a node. The reason
seems to be related to a node that is no longer part of the cluster.

For example, at one point I had a three node cluster: play, play2, and
util. I then removed util from the cluster, although to be honest, simply
by changing the rabbitmq.config file, rather than explicitly running
rabbitmqctl stop_app while the cluster is still running.

My steps:

   - Running as three node cluster, stop all brokers
   - Create a new rabbitmq.config file with just two brokers
   - Attempt to start the new cluster.


Here's my revised RabbitMQ config file. Note references to rabbit at util,
which is no longer part of the cluster:
--------
[
{rabbit, [{cluster_nodes, [rabbit at play,rabbit at play2]}, {disk_free_limit,
104857600}]},
{mnesia, [{debug, trace}]}
].
--------


The rabbit at play.log file:

=ERROR REPORT==== 28-Sep-2012::11:31:11 ===
Mnesia(rabbit at play): ** ERROR ** (core dumped to file:
"/home/mpietrek/MnesiaCore.rabbit at play_1348_857071_113813")
 ** FATAL ** Failed to merge schema: Bad cookie in table definition
mirrored_sup_childspec: rabbit at play =
{cstruct,mirrored_sup_childspec,ordered_set,[rabbit at util,rabbit at play2
,rabbit at play
],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],[],{{1346,266106,862481},rabbit at play
},{{4,0},{rabbit at util,{1348,781672,983885}}}}, rabbit at util =
{cstruct,mirrored_sup_childspec,ordered_set,[rabbit at util
],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],[],{{1348,854318,794171},rabbit at util
},{{2,0},[]}}


=ERROR REPORT==== 28-Sep-2012::11:31:21 ===
** Generic server mnesia_monitor terminating
** Last message in was {'EXIT',<0.44.0>,killed}
** When Server state == {state,<0.44.0>,[],[],true,[],undefined,[]}
** Reason for termination ==
** killed

=ERROR REPORT==== 28-Sep-2012::11:31:21 ===
** Generic server mnesia_recover terminating
** Last message in was {'EXIT',<0.44.0>,killed}
** When Server state ==
{state,<0.44.0>,undefined,undefined,undefined,0,false,
                               true,[]}
** Reason for termination ==
** killed

=ERROR REPORT==== 28-Sep-2012::11:31:21 ===
** Generic server mnesia_snmp_sup terminating
** Last message in was {'EXIT',<0.44.0>,killed}
** When Server state == {state,
                            {local,mnesia_snmp_sup},
                            simple_one_for_one,
                            [{child,undefined,mnesia_snmp_sup,
                                 {mnesia_snmp_hook,start,[]},
                                 transient,3000,worker,
                                 [mnesia_snmp_sup,mnesia_snmp_hook,
                                  supervisor]}],
                            undefined,0,86400000,[],mnesia_snmp_sup,[]}
** Reason for termination ==
** killed

=ERROR REPORT==== 28-Sep-2012::11:31:21 ===
** Generic server mnesia_subscr terminating
** Last message in was {'EXIT',<0.44.0>,killed}
** When Server state == {state,<0.44.0>,57361}
** Reason for termination ==
** killed

=INFO REPORT==== 28-Sep-2012::11:31:21 ===
    application: mnesia
    exited: {shutdown,{mnesia_sup,start,[normal,[]]}}
    type: permanent


And finally, the output from rabbitmq-server:

Mnesia(rabbit at play): mnesia_late_loader starting: <0.81.0>
Mnesia(rabbit at play): Transaction {tid,21431,<0.82.0>} calling
#Fun<mnesia_schema.42.93736344> with [] failed:
 {aborted,{throw,[66,97,100,32,99,111,111,107,105,101,32,105,110,32,116,97,98,
                  108,101,32,100,101,102,105,110,105,116,105,111,110,32,
                  "mirrored_sup_childspec",58,32,"rabbit at play",32,61,32,
                  [123,

["cstruct",44,"mirrored_sup_childspec",44,"ordered_set",44,
                    [91,["rabbit at util",44,"rabbit at play2",44,"rabbit at play
"],93],

44,"[]",44,"[]",44,"0",44,"read_write",44,"false",44,"[]",
                    44,"[]",44,"false",44,"mirrored_sup_childspec",44,
                    [91,["key",44,"mirroring_pid",44,"childspec"],93],
                    44,"[]",44,"[]",44,"[]",44,
                    [123,
                     [[123,["1346",44,"266106",44,"862481"],125],
                      44,"rabbit at play"],
                     125],
                    44,
                    [123,
                     [[123,["4",44,"0"],125],
                      44,
                      [123,
                       ["rabbit at util",44,
                        [123,["1348",44,"781672",44,"983885"],125]],
                       125]],
                     125]],
                   125],
                  44,32,"rabbit at util",32,61,32,
                  [123,

["cstruct",44,"mirrored_sup_childspec",44,"ordered_set",44,
                    [91,["rabbit at util"],93],

44,"[]",44,"[]",44,"0",44,"read_write",44,"false",44,"[]",
                    44,"[]",44,"false",44,"mirrored_sup_childspec",44,
                    [91,["key",44,"mirroring_pid",44,"childspec"],93],
                    44,"[]",44,"[]",44,"[]",44,
                    [123,
                     [[123,["1348",44,"854318",44,"794171"],125],
                      44,"rabbit at util"],
                     125],
                    44,
                    [123,[[123,["2",44,"0"],125],44,"[]"],125]],
                   125],
                  "\n"]}}
Mnesia(rabbit at play): mnesia_monitor got FATAL ERROR from: <0.81.0>
Mnesia(rabbit at play): mnesia_subscr terminated: killed
{"Kernel pid
terminated",application_controller,"{application_start_failure,mnesia,{shutdown,{mnesia_sup,start,[normal,[]]}}}"}

Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller)
({application_start_failure,mnesia,{shutdown,{mnesia_sup,start,[normal,[]]}}})

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120928/3464fc19/attachment.htm>


More information about the rabbitmq-discuss mailing list