[rabbitmq-discuss] [Q] Cluster configuration is not retained by node after restart

Thu Jun 28 11:00:29 BST 2012

Hi, 

I Started again with a fresh installation of 2.7.1.

There
remain errors when I request it to join a cluster. 

There is nothing
recorded in /var/log/rabbitmq/shutdown_err.

I have replaced the server
names with Node1 and Node2 respectivly to make it easier to follow.

Any
ideas?

***************************
**** ** Attempt # 1 ** ****

*************************** 

Rpm -e rabbit-server. Remove the content
of /var/lib/rabbitmq and /etc/rabbitmq,. and rpm -i rabbit-server....rpm

**** Start Node1 ****
[root at Node1 ~]# /etc/init.d/rabbitmq-server
start
Starting rabbitmq-server: SUCCESS
rabbitmq-server.

** Verify that
the server is really running.
[root at Node1 ~]# ps -aef|grep ra
root 3 2 0
Jun27 ? 00:00:00 [migration/0]
root 5 2 0 Jun27 ? 00:00:00
[migration/0]
rabbitmq 1710 1 0 Jun27 ? 00:00:00
/usr/lib64/erlang/erts-5.8.5/bin/epmd -daemon
root 5122 1 0 11:13 ?
00:00:00 /bin/sh /usr/sbin/rabbitmq-server
root 5133 5122 0 11:13 ?
00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmq-server

rabbitmq 5136 5133 3 11:13 ? 00:00:00
/usr/lib64/erlang/erts-5.8.5/bin/beam -W w -K true -A30 -P 1048576 --
-root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq --
-noshell -noinput -sname rabbit at Node1 -boot
/var/lib/rabbitmq/mnesia/rabbit at Node1-plugins-expand/rabbit -kernel
inet_default_connect_options [{nodelay,true}] -sasl errlog_type error
-sasl sasl_error_logger false -rabbit error_logger
{file,"/var/log/rabbitmq/rabbit at Node1.log"} -rabbit sasl_error_logger
{file,"/var/log/rabbitmq/rabbit at Node1-sasl.log"} -os_mon start_cpu_sup
true -os_mon start_disksup false -os_mon start_memsup false -mnesia dir
"/var/lib/rabbitmq/mnesia/rabbit at Node1"
rabbitmq 5216 5136 0 11:13 ?
00:00:00 /usr/lib64/erlang/lib/os_mon-2.2.7/priv/bin/cpu_sup
rabbitmq
5217 5136 0 11:13 ? 00:00:00 inet_gethost 4
rabbitmq 5218 5217 0 11:13 ?
00:00:00 inet_gethost 4

** Server is listening
[root at Node1 ~]# lsof -P
-i tcp |grep rabbit|grep LIST
epmd 1710 rabbitmq 3u IPv4 10926 0t0 TCP
*:4369 (LISTEN)
beam 5136 rabbitmq 7u IPv4 27914 0t0 TCP *:45343
(LISTEN)
beam 5136 rabbitmq 17u IPv6 27951 0t0 TCP *:5672 (LISTEN)
beam
5136 rabbitmq 19r IPv4 27984 0t0 TCP *:55672 (LISTEN)

**** Start Node2
****
[root at Node2 ~]# /etc/init.d/rabbitmq-server start
Starting
rabbitmq-server: SUCCESS
rabbitmq-server.

** Verify that it is
running
-bash-4.1$ ps -eaf|grep rabbit|grep -v grep
rabbitmq 2172 1 0
Jun27 ? 00:00:00 /usr/lib64/erlang/erts-5.8.5/bin/epmd -daemon
root 5245
1 0 11:12 ? 00:00:00 /bin/sh /usr/sbin/rabbitmq-server
root 5257 5245 0
11:12 ? 00:00:00 su rabbitmq -s /bin/sh -c
/usr/lib/rabbitmq/bin/rabbitmq-server 
rabbitmq 5260 5257 0 11:12 ?
00:00:00 /usr/lib64/erlang/erts-5.8.5/bin/beam -W w -K true -A30 -P
1048576 -- -root /usr/lib64/erlang -progname erl -- -home
/var/lib/rabbitmq -- -noshell -noinput -sname rabbit at iup-app008 -boot
/var/lib/rabbitmq/mnesia/rabbit at iup-app008-plugins-expand/rabbit -kernel
inet_default_connect_options [{nodelay,true}] -sasl errlog_type error
-sasl sasl_error_logger false -rabbit error_logger
{file,"/var/log/rabbitmq/rabbit at iup-app008.log"} -rabbit
sasl_error_logger {file,"/var/log/rabbitmq/rabbit at iup-app008-sasl.log"}
-os_mon start_cpu_sup true -os_mon start_disksup false -os_mon
start_memsup false -mnesia dir
"/var/lib/rabbitmq/mnesia/rabbit at iup-app008"
rabbitmq 5340 5260 0 11:13
? 00:00:00 inet_gethost 4
rabbitmq 5341 5340 0 11:13 ? 00:00:00
inet_gethost 4
rabbitmq 5346 5345 0 11:13 pts/0 00:00:00 -bash

**
Server is listening
[root at Node2 ~]# lsof -P -i tcp |grep rabbit|grep
LIST
epmd 2172 rabbitmq 3u IPv4 11583 0t0 TCP *:4369 (LISTEN)
beam 5260
rabbitmq 7u IPv4 164034 0t0 TCP *:51764 (LISTEN)

** Attempt to cluster
Node2 with Node1 as rabbitmq user
-bash-4.1$ rabbitmqctl
stop_app
Stopping node 'rabbit at Node2' ...
...done.
-bash-4.1$
rabbitmqctl reset
Resetting node 'rabbit at Node2' ...
...done.
-bash-4.1$
rabbitmqctl cluster rabbit at Node1 rabbit at Node2
Clustering node
'rabbit at Node2' with ['rabbit at Node1',
 'rabbit at Node2'] ...
Error:
{no_running_cluster_nodes,['rabbit at Node1'],['rabbit at Node1']}
-bash-4.1$

***************************
**** ** Attempt # 2 ** ****

***************************
I tried once more from scratch, and thought
I would get away with a shortcut... Fail: 

Rpm -e rabbit-server. Remove
the content of /var/lib/rabbitmq and /etc/rabbitmq,.  

Deleted all the
directories that rpm -e did not remove ( Debian's purge is far nicer)
and rebooted both of the RedHat servers. These are virtual boxes & the
reboot cycle is about 2 mins.

After the servers came back I did this:

# rpm -i rabbit-server....rpm 

Create and add this line into
/etc/rabbitmq/rabbitmq.config:
 [{rabbit, [{cluster_nodes,
['rabbit at Node1', 'rabbit at Node2']}]}].

Node1:
/etc/init.d/rabbitmq-server start
Starting rabbitmq-server:
SUCCESS
rabbitmq-server

Copy .erland.cookie to Node2
-r-------- 1
rabbitmq rabbitmq 20 Jun 28 11:35 .erlang.cookie

Node1:
/etc/init.d/rabbitmq-server start
# /etc/init.d/rabbitmq-server
start
Starting rabbitmq-server: FAILED - check
/var/log/rabbitmq/startup_{log, _err}

# cat
/var/log/rabbitmq/startup_err
Erlang has closed

Crash dump was written
to: erl_crash.dump
Kernel pid terminated (application_controller)
({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})

On
27.06.2012 15:35, Simon Loewenthal wrote: 

> Nope, I'm not reissuing
the commands. 
> 
> One part I omitted was that there was an previous
install of Rabbit on the server, but the rpm was removed, and an rm -r
/var/lib/rabbitmq/* was done. The .erlang.cookie was also removed, and I
double checked that the mnesia files were definitely removed afore I did
anything. 
> 
> On 27/06/12 15:31, Simon MacMullen wrote: 
> 
>> By
"reclustering" I just mean reissuing "rabbitmqctl reset" "rabbitmqctl
cluster ..." commands, and thus redefining the cluster. 
>> 
>>
Alternatively, could something be removing files from
/var/lib/rabbitmq/mnesia/ ? 
>> 
>> Cheers, Simon 
>> 
>> On 27/06/12
14:24, Simon Loewenthal wrote: 
>> 
>>> Hi Simon, 
>>> 
>>> I rebooted
both servers : /# reboot/ 
>>> You've given a good reason to move to
2.8.n 
>>> 
>>> > Are you definitely not reclustering at any stage? 
>>>
Apologies, but I don't understand the term reclustering. 
>>> 
>>> S.

>>> 
>>> On 27/06/12 15:20, Simon MacMullen wrote: 
>>> 
>>>> Hi. 
>>>>

>>>> You say you "reboot the server" - is this node 1, node 2, or both?

>>>> 
>>>> The description you have after rebooting the server is
consistent with 
>>>> having one RAM node and one disc node and
restarting both of them - in 
>>>> 2.7.0 if the RAM node came back up
first it would create a blank 
>>>> database. This was fixed in 2.8.x,
in that now a RAM node will refuse 
>>>> to start if there are no disc
nodes to connect to. 
>>>> 
>>>> However, when you initially create the
cluster you are creating both 
>>>> nodes as disc nodes. So I am puzzled
as to how you could have got to 
>>>> the state where one node was a RAM
node. Are you definitely not 
>>>> reclustering at any stage? 
>>>>

>>>> Cheers, Simon 
>>>> 
>>>> On 27/06/12 12:48, S.Loewenthal wrote:

>>>> 
>>>>> Hi there, 
>>>>> 
>>>>> I installed RabbitMQ server but
cannot get one node to retain the 
>>>>> cluster configuration.
Step-by-step account follows for what I did to 
>>>>> configure these
nodes. 
>>>>> Node 1: rabbit at iuu-7 
>>>>> Nide 2: rabbit at iuu-8 
>>>>>

>>>>> On both nodes install 2.7.0 - This version as all other
environment run 
>>>>> same version. 
>>>>> # rpm -ivh
rabbitmq-server-2.7.0-1.noarch.rpm 
>>>>> 
>>>>> Start Node 2 (if not
already running) 
>>>>> # /etc/init.d/rabbitmq-server start 
>>>>>

>>>>> Shutdown RabbitMQ on Node 2 
>>>>> # /etc/init.d/rabbitmq-server
stop 
>>>>> 
>>>>> Copied the Erlang cookie so that both nodes have the
same cookie, or 
>>>>> else these won't talk with each other 
>>>>> Copy
(Node 1) iuu-7:/var/lib/rabbitmq/.erlang.cookie to (Node 2) 
>>>>>
iuu-8:/var/lib/rabbitmq/.erlang.cookie 
>>>>> permissions and ownership
are : Perms 400 -- Owner rabbitmq:rabbitmq 
>>>>> 
>>>>> Start RabbitMQ
on Node 2 
>>>>> # /etc/init.d/rabbitmq-server start 
>>>>> 
>>>>>
Verify that each node can communicate with each other: 
>>>>> On Node 1

>>>>> $ rabbitmqctl -n rabbit at iuu-8 status 
>>>>> On Node 2 
>>>>> $
rabbitmqctl -n rabbit at iuu-7 status 
>>>>> 
>>>>> Reset Node 2 so that it
is ready to join the cluster on Node 1 
>>>>> $ rabbitmqctl stop_app

>>>>> $ rabbitmqctl reset 
>>>>> 
>>>>> Add the Node 2 to the first
node with disc writing enabled on both 
>>>>> nodes: 
>>>>> $
rabbitmqctl cluster rabbit at iuu-7 rabbit at iuu-8 
>>>>> 
>>>>> Start the
app on Node 2 
>>>>> $ rabbitmqctl start_app 
>>>>> 
>>>>> Verify the
cluster status on Node 2 and the same on Node 1 
>>>>> $ rabbitmqctl
cluster_status 
>>>>> Cluster status of node 'rabbit at iuu-8' ... 
>>>>>
[{nodes,[{disc,['rabbit at iuu-8','rabbit at iuu-7']}]}, 
>>>>>
{running_nodes,['rabbit at iuu-7','rabbit at iuu-8']}] 
>>>>> ...done. 
>>>>>

>>>>> next I add a few users, and reboot the server. Now the cluster is

>>>>> disconnection, and I cannot get ot to connnect. Also, the users
added 
>>>>> afore the reboot no longer exist. 
>>>>> 
>>>>> It comes up
without the cluster information nor the users: 
>>>>> [root at iuu-7 ~]#
rabbitmqctl cluster_status 
>>>>> Cluster status of node 'rabbit at iup-7'
... 
>>>>>
[{nodes,[{ram,['rabbit at iuu-7']}]},{running_nodes,['rabbit at iuu-7']}]

>>>>> ...done. 
>>>>> 
>>>>> [root at iup-8 ~]# rabbitmqctl cluster_status

>>>>> Cluster status of node 'rabbit at iup-' ... 
>>>>>
[{nodes,[{disc,['rabbit at iup-8']},{ram,['rabbit at iup-7']}]}, 
>>>>>
{running_nodes,['rabbit at iup-8']}] 
>>>>> ...done. 
>>>>> 
>>>>> The
cluster configuration does not survive a server restart and in this

>>>>> case user account was was lost. I imagine that I have
misconfigured 
>>>>> something. 
>>>>> 
>>>>> ** The question ** 
>>>>>
How can I ensure that the cluster reconnects after rabbitmq-server

>>>>> restarts, and that data that was written is retained? Or, what
did I 
>>>>> misconfigure 
>>>>> 
>>>>> Many thanks, S. 
>>>>> 
>>>>>
_______________________________________________ 
>>>>> rabbitmq-discuss
mailing list 
>>>>> rabbitmq-discuss at lists.rabbitmq.com [1] 
>>>>>
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
[2]
> 
> -- 
> PGP is optional: 4BA78604
> simon @ klunky . org
> simon
@ klunky . co.uk
> I won't accept your confidentiality
> agreement, and
your Emails are kept.
> ~Ö¿Ö~

-- 
--
Dogs are tough.
I've been
interrogating this one for hours 
and he still won't tell me who's a
good boy.
simon at klunky.co.uk www.klunky.org
--

Links:
------
[1]
mailto:rabbitmq-discuss at lists.rabbitmq.com
[2]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120628/59c9af6e/attachment.htm>