[rabbitmq-discuss] [Q] Cluster configuration is not retained by node after restart
S.Loewenthal
simon at klunky.co.uk
Thu Jun 28 11:00:29 BST 2012
Hi,
I Started again with a fresh installation of 2.7.1.
There
remain errors when I request it to join a cluster.
There is nothing
recorded in /var/log/rabbitmq/shutdown_err.
I have replaced the server
names with Node1 and Node2 respectivly to make it easier to follow.
Any
ideas?
***************************
**** ** Attempt # 1 ** ****
***************************
Rpm -e rabbit-server. Remove the content
of /var/lib/rabbitmq and /etc/rabbitmq,. and rpm -i rabbit-server....rpm
**** Start Node1 ****
[root at Node1 ~]# /etc/init.d/rabbitmq-server
start
Starting rabbitmq-server: SUCCESS
rabbitmq-server.
** Verify that
the server is really running.
[root at Node1 ~]# ps -aef|grep ra
root 3 2 0
Jun27 ? 00:00:00 [migration/0]
root 5 2 0 Jun27 ? 00:00:00
[migration/0]
rabbitmq 1710 1 0 Jun27 ? 00:00:00
/usr/lib64/erlang/erts-5.8.5/bin/epmd -daemon
root 5122 1 0 11:13 ?
00:00:00 /bin/sh /usr/sbin/rabbitmq-server
root 5133 5122 0 11:13 ?
00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 5136 5133 3 11:13 ? 00:00:00
/usr/lib64/erlang/erts-5.8.5/bin/beam -W w -K true -A30 -P 1048576 --
-root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq --
-noshell -noinput -sname rabbit at Node1 -boot
/var/lib/rabbitmq/mnesia/rabbit at Node1-plugins-expand/rabbit -kernel
inet_default_connect_options [{nodelay,true}] -sasl errlog_type error
-sasl sasl_error_logger false -rabbit error_logger
{file,"/var/log/rabbitmq/rabbit at Node1.log"} -rabbit sasl_error_logger
{file,"/var/log/rabbitmq/rabbit at Node1-sasl.log"} -os_mon start_cpu_sup
true -os_mon start_disksup false -os_mon start_memsup false -mnesia dir
"/var/lib/rabbitmq/mnesia/rabbit at Node1"
rabbitmq 5216 5136 0 11:13 ?
00:00:00 /usr/lib64/erlang/lib/os_mon-2.2.7/priv/bin/cpu_sup
rabbitmq
5217 5136 0 11:13 ? 00:00:00 inet_gethost 4
rabbitmq 5218 5217 0 11:13 ?
00:00:00 inet_gethost 4
** Server is listening
[root at Node1 ~]# lsof -P
-i tcp |grep rabbit|grep LIST
epmd 1710 rabbitmq 3u IPv4 10926 0t0 TCP
*:4369 (LISTEN)
beam 5136 rabbitmq 7u IPv4 27914 0t0 TCP *:45343
(LISTEN)
beam 5136 rabbitmq 17u IPv6 27951 0t0 TCP *:5672 (LISTEN)
beam
5136 rabbitmq 19r IPv4 27984 0t0 TCP *:55672 (LISTEN)
**** Start Node2
****
[root at Node2 ~]# /etc/init.d/rabbitmq-server start
Starting
rabbitmq-server: SUCCESS
rabbitmq-server.
** Verify that it is
running
-bash-4.1$ ps -eaf|grep rabbit|grep -v grep
rabbitmq 2172 1 0
Jun27 ? 00:00:00 /usr/lib64/erlang/erts-5.8.5/bin/epmd -daemon
root 5245
1 0 11:12 ? 00:00:00 /bin/sh /usr/sbin/rabbitmq-server
root 5257 5245 0
11:12 ? 00:00:00 su rabbitmq -s /bin/sh -c
/usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 5260 5257 0 11:12 ?
00:00:00 /usr/lib64/erlang/erts-5.8.5/bin/beam -W w -K true -A30 -P
1048576 -- -root /usr/lib64/erlang -progname erl -- -home
/var/lib/rabbitmq -- -noshell -noinput -sname rabbit at iup-app008 -boot
/var/lib/rabbitmq/mnesia/rabbit at iup-app008-plugins-expand/rabbit -kernel
inet_default_connect_options [{nodelay,true}] -sasl errlog_type error
-sasl sasl_error_logger false -rabbit error_logger
{file,"/var/log/rabbitmq/rabbit at iup-app008.log"} -rabbit
sasl_error_logger {file,"/var/log/rabbitmq/rabbit at iup-app008-sasl.log"}
-os_mon start_cpu_sup true -os_mon start_disksup false -os_mon
start_memsup false -mnesia dir
"/var/lib/rabbitmq/mnesia/rabbit at iup-app008"
rabbitmq 5340 5260 0 11:13
? 00:00:00 inet_gethost 4
rabbitmq 5341 5340 0 11:13 ? 00:00:00
inet_gethost 4
rabbitmq 5346 5345 0 11:13 pts/0 00:00:00 -bash
**
Server is listening
[root at Node2 ~]# lsof -P -i tcp |grep rabbit|grep
LIST
epmd 2172 rabbitmq 3u IPv4 11583 0t0 TCP *:4369 (LISTEN)
beam 5260
rabbitmq 7u IPv4 164034 0t0 TCP *:51764 (LISTEN)
** Attempt to cluster
Node2 with Node1 as rabbitmq user
-bash-4.1$ rabbitmqctl
stop_app
Stopping node 'rabbit at Node2' ...
...done.
-bash-4.1$
rabbitmqctl reset
Resetting node 'rabbit at Node2' ...
...done.
-bash-4.1$
rabbitmqctl cluster rabbit at Node1 rabbit at Node2
Clustering node
'rabbit at Node2' with ['rabbit at Node1',
'rabbit at Node2'] ...
Error:
{no_running_cluster_nodes,['rabbit at Node1'],['rabbit at Node1']}
-bash-4.1$
***************************
**** ** Attempt # 2 ** ****
***************************
I tried once more from scratch, and thought
I would get away with a shortcut... Fail:
Rpm -e rabbit-server. Remove
the content of /var/lib/rabbitmq and /etc/rabbitmq,.
Deleted all the
directories that rpm -e did not remove ( Debian's purge is far nicer)
and rebooted both of the RedHat servers. These are virtual boxes & the
reboot cycle is about 2 mins.
After the servers came back I did this:
# rpm -i rabbit-server....rpm
Create and add this line into
/etc/rabbitmq/rabbitmq.config:
[{rabbit, [{cluster_nodes,
['rabbit at Node1', 'rabbit at Node2']}]}].
Node1:
/etc/init.d/rabbitmq-server start
Starting rabbitmq-server:
SUCCESS
rabbitmq-server
Copy .erland.cookie to Node2
-r-------- 1
rabbitmq rabbitmq 20 Jun 28 11:35 .erlang.cookie
Node1:
/etc/init.d/rabbitmq-server start
# /etc/init.d/rabbitmq-server
start
Starting rabbitmq-server: FAILED - check
/var/log/rabbitmq/startup_{log, _err}
# cat
/var/log/rabbitmq/startup_err
Erlang has closed
Crash dump was written
to: erl_crash.dump
Kernel pid terminated (application_controller)
({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})
On
27.06.2012 15:35, Simon Loewenthal wrote:
> Nope, I'm not reissuing
the commands.
>
> One part I omitted was that there was an previous
install of Rabbit on the server, but the rpm was removed, and an rm -r
/var/lib/rabbitmq/* was done. The .erlang.cookie was also removed, and I
double checked that the mnesia files were definitely removed afore I did
anything.
>
> On 27/06/12 15:31, Simon MacMullen wrote:
>
>> By
"reclustering" I just mean reissuing "rabbitmqctl reset" "rabbitmqctl
cluster ..." commands, and thus redefining the cluster.
>>
>>
Alternatively, could something be removing files from
/var/lib/rabbitmq/mnesia/ ?
>>
>> Cheers, Simon
>>
>> On 27/06/12
14:24, Simon Loewenthal wrote:
>>
>>> Hi Simon,
>>>
>>> I rebooted
both servers : /# reboot/
>>> You've given a good reason to move to
2.8.n
>>>
>>> > Are you definitely not reclustering at any stage?
>>>
Apologies, but I don't understand the term reclustering.
>>>
>>> S.
>>>
>>> On 27/06/12 15:20, Simon MacMullen wrote:
>>>
>>>> Hi.
>>>>
>>>> You say you "reboot the server" - is this node 1, node 2, or both?
>>>>
>>>> The description you have after rebooting the server is
consistent with
>>>> having one RAM node and one disc node and
restarting both of them - in
>>>> 2.7.0 if the RAM node came back up
first it would create a blank
>>>> database. This was fixed in 2.8.x,
in that now a RAM node will refuse
>>>> to start if there are no disc
nodes to connect to.
>>>>
>>>> However, when you initially create the
cluster you are creating both
>>>> nodes as disc nodes. So I am puzzled
as to how you could have got to
>>>> the state where one node was a RAM
node. Are you definitely not
>>>> reclustering at any stage?
>>>>
>>>> Cheers, Simon
>>>>
>>>> On 27/06/12 12:48, S.Loewenthal wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> I installed RabbitMQ server but
cannot get one node to retain the
>>>>> cluster configuration.
Step-by-step account follows for what I did to
>>>>> configure these
nodes.
>>>>> Node 1: rabbit at iuu-7
>>>>> Nide 2: rabbit at iuu-8
>>>>>
>>>>> On both nodes install 2.7.0 - This version as all other
environment run
>>>>> same version.
>>>>> # rpm -ivh
rabbitmq-server-2.7.0-1.noarch.rpm
>>>>>
>>>>> Start Node 2 (if not
already running)
>>>>> # /etc/init.d/rabbitmq-server start
>>>>>
>>>>> Shutdown RabbitMQ on Node 2
>>>>> # /etc/init.d/rabbitmq-server
stop
>>>>>
>>>>> Copied the Erlang cookie so that both nodes have the
same cookie, or
>>>>> else these won't talk with each other
>>>>> Copy
(Node 1) iuu-7:/var/lib/rabbitmq/.erlang.cookie to (Node 2)
>>>>>
iuu-8:/var/lib/rabbitmq/.erlang.cookie
>>>>> permissions and ownership
are : Perms 400 -- Owner rabbitmq:rabbitmq
>>>>>
>>>>> Start RabbitMQ
on Node 2
>>>>> # /etc/init.d/rabbitmq-server start
>>>>>
>>>>>
Verify that each node can communicate with each other:
>>>>> On Node 1
>>>>> $ rabbitmqctl -n rabbit at iuu-8 status
>>>>> On Node 2
>>>>> $
rabbitmqctl -n rabbit at iuu-7 status
>>>>>
>>>>> Reset Node 2 so that it
is ready to join the cluster on Node 1
>>>>> $ rabbitmqctl stop_app
>>>>> $ rabbitmqctl reset
>>>>>
>>>>> Add the Node 2 to the first
node with disc writing enabled on both
>>>>> nodes:
>>>>> $
rabbitmqctl cluster rabbit at iuu-7 rabbit at iuu-8
>>>>>
>>>>> Start the
app on Node 2
>>>>> $ rabbitmqctl start_app
>>>>>
>>>>> Verify the
cluster status on Node 2 and the same on Node 1
>>>>> $ rabbitmqctl
cluster_status
>>>>> Cluster status of node 'rabbit at iuu-8' ...
>>>>>
[{nodes,[{disc,['rabbit at iuu-8','rabbit at iuu-7']}]},
>>>>>
{running_nodes,['rabbit at iuu-7','rabbit at iuu-8']}]
>>>>> ...done.
>>>>>
>>>>> next I add a few users, and reboot the server. Now the cluster is
>>>>> disconnection, and I cannot get ot to connnect. Also, the users
added
>>>>> afore the reboot no longer exist.
>>>>>
>>>>> It comes up
without the cluster information nor the users:
>>>>> [root at iuu-7 ~]#
rabbitmqctl cluster_status
>>>>> Cluster status of node 'rabbit at iup-7'
...
>>>>>
[{nodes,[{ram,['rabbit at iuu-7']}]},{running_nodes,['rabbit at iuu-7']}]
>>>>> ...done.
>>>>>
>>>>> [root at iup-8 ~]# rabbitmqctl cluster_status
>>>>> Cluster status of node 'rabbit at iup-' ...
>>>>>
[{nodes,[{disc,['rabbit at iup-8']},{ram,['rabbit at iup-7']}]},
>>>>>
{running_nodes,['rabbit at iup-8']}]
>>>>> ...done.
>>>>>
>>>>> The
cluster configuration does not survive a server restart and in this
>>>>> case user account was was lost. I imagine that I have
misconfigured
>>>>> something.
>>>>>
>>>>> ** The question **
>>>>>
How can I ensure that the cluster reconnects after rabbitmq-server
>>>>> restarts, and that data that was written is retained? Or, what
did I
>>>>> misconfigure
>>>>>
>>>>> Many thanks, S.
>>>>>
>>>>>
_______________________________________________
>>>>> rabbitmq-discuss
mailing list
>>>>> rabbitmq-discuss at lists.rabbitmq.com [1]
>>>>>
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
[2]
>
> --
> PGP is optional: 4BA78604
> simon @ klunky . org
> simon
@ klunky . co.uk
> I won't accept your confidentiality
> agreement, and
your Emails are kept.
> ~Ö¿Ö~
--
--
Dogs are tough.
I've been
interrogating this one for hours
and he still won't tell me who's a
good boy.
simon at klunky.co.uk www.klunky.org
--
Links:
------
[1]
mailto:rabbitmq-discuss at lists.rabbitmq.com
[2]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120628/59c9af6e/attachment.htm>
More information about the rabbitmq-discuss
mailing list