[rabbitmq-discuss] Mnesia corrupting after node joining cluster
David Brown
dbrown at prmllc.com
Wed Aug 1 19:15:42 BST 2012
Hi Matthias,
thanks for the quick answers.
> Also, are you sure you actually *need* clustering? It does, by necessity,
> add a significant amount of complexity and possible failure modes, so only
> use it if you have to.
To implement distributed messaging, my understanding is I've got three
options, cluster, federation, shovel. In my case I've got a simple office
network with 16 machines. The apps that
run on the machines need to see all the messages that get published, so I
have a simple fanout exchange. Each app instantiates it's own distinct,
non
persistant queue. So a cluster seemed to be the simplest choice, in that
there is just a single fanout exchange and a simple transient queue for
each
app.
Is there any non clustered way to enable distribuited messaging in a single
location (other than Federation and the Shovel)?
Thanks much
David
> ----- Original Message -----
> From: "Matthias Radestock" <matthias at rabbitmq.com>
> To: "David Brown" <dbrown at prmllc.com>; "Discussions about RabbitMQ"
> <rabbitmq-discuss at lists.rabbitmq.com>
> Sent: Wednesday, August 01, 2012 12:46 PM
> Subject: Re: [rabbitmq-discuss] Mnesia corrupting after node joining
> cluster
>
>
>> David,
>>
>> On 01/08/12 18:20, David Brown wrote:
>>> has there been any work on this issue (i.e. errors when doing admin work
>>> on a cluster)?
>>
>> yes, but it will likely be a few months before these changes make it into
>> a release.
>>
>>> I've got a tiny, two node development cluster. Removing
>>> the ram node caused the remaining (disc) node to fail on startup with a
>>> mnesia related error when I restarted it. Eventually, the remaining
>>> (disc) node started up, but it still thinks the other node is clustered
>>> with it. I've tried everything to try and get this node to realize it
>>> is the only node left in the cluster, nothing works. FWIW the removed
>>> node does realize it is no longer part of the two node cluster.
>>>
>>> I was very careful in terms of following the exact steps in the
>>> 'Breaking up a cluster' section on the rabbitmq web site.
>>
>> Hmm. It certainly looks like the disk node still thinks it is clustered
>> with the ram node, and consequently it will fail to merge its schema with
>> it when starting. That really shouldn't happen if you indeed followed the
>> documented steps when breaking up the cluster, in particular the disk
>> node was up and running when removing the ram node.
>>
>> If you can reproduce the problem then please post a transcript of the
>> commands.
>>
>>> At this point, I'm a bit concerned about basing our production
>>> systems around rabbitmq (we're a small hedge fund) when it seems to
>>> fail on the simplest of tasks.
>>
>> Clustering is hardly the "simplest of tasks" - sending and receiving
>> messages is ;)
>>
>> The clustering code hasn't changed much in 4+ years. It is stable but
>> suffers from little mistakes resulting in situations that are hard to
>> recover from - that's what Francesco is addressing.
>>
>> I suggest you conduct some more experiments and keep transcripts of
>> everything you are doing. Then, if you do encounter a weird situation it
>> will be much easier to reproduce and diagnose the problem.
>>
>> Also, are you sure you actually *need* clustering? It does, by necessity,
>> add a significant amount of complexity and possible failure modes, so
>> only use it if you have to.
>>
>> Regards,
>>
>> Matthias.
>>
>>
>
More information about the rabbitmq-discuss
mailing list