[rabbitmq-discuss] Rabbitmq 2.8.2 with heartbeat/DRBD fail

askgax askgax at pchome.com.tw
Thu May 17 14:54:49 BST 2012


Hi Emile,

thanks for reply,

my mq server have 4G ram,and disk usage status

[root at mq1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/vda3              26G  5.6G   19G  24% /
/dev/vda1              99M   12M   82M  13% /boot
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/drbd1             20G  176M   19G   1% /media/drbd1

both rabbitmq 2.7.1 and rabbitmq 2.8.2 environment are the same hardware
spec,

in rabbitmq 2.8.2 environment,execute heartbeat takeover,the debug log no
error about rabbitmq,

May 17 21:42:08 mq1 heartbeat: [2311]: debug: StartNextRemoteRscReq() -
calling hook
May 17 21:42:08 mq1 heartbeat: [2311]: debug: notify_world: invoking harc:
OLD status: active
May 17 21:42:08 mq1 heartbeat: [2311]: debug: Process [hb_takeover] started
pid 2561
May 17 21:42:08 mq1 heartbeat: [2311]: debug: Starting notify process
[hb_takeover]
May 17 21:42:08 mq1 heartbeat: [2561]: debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
May 17 21:42:08 mq1 heartbeat: [2561]: debug: notify_world: Running harc
hb_takeover
harc[2561]:     2012/05/17_21:42:08 info: Running
/etc/ha.d//rc.d/hb_takeover hb_takeover
May 17 21:42:08 mq1 heartbeat: [2311]: info: Managed hb_takeover process
2561 exited with return code 0.
May 17 21:42:08 mq1 heartbeat: [2311]: debug: RscMgmtProc 'hb_takeover'
exited code 0
May 17 21:42:09 mq1 heartbeat: [2311]: debug: Received standby message me
from mq2 in state 0
May 17 21:42:09 mq1 heartbeat: [2311]: debug: ask_for_resources: other now
unstable
May 17 21:42:09 mq1 heartbeat: [2311]: info: mq2 wants to go standby [all]
May 17 21:42:09 mq1 heartbeat: [2311]: info: standby: other_holds_resources:
3
May 17 21:42:09 mq1 heartbeat: [2311]: debug: Sending standby [other] msg
May 17 21:42:09 mq1 heartbeat: [2311]: debug: Received standby message other
from mq1 in state 2
May 17 21:42:09 mq1 heartbeat: [2311]: info: New standby state: 2
May 17 21:42:09 mq1 heartbeat: [2311]: info: New standby state: 2
May 17 21:42:09 mq1 heartbeat: [2311]: debug: process_resources(2):  other
now unstable
May 17 21:42:09 mq1 heartbeat: [2311]: info: other_holds_resources: 0
May 17 21:42:09 mq1 heartbeat: [2311]: debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 2, standby running(ms): 4395531970, resourcestate: 4
May 17 21:42:09 mq1 ipfail: [2341]: debug: Other side is unstable.
May 17 21:42:11 mq1 heartbeat: [2311]: debug: Received standby message done
from mq2 in state 2
May 17 21:42:11 mq1 heartbeat: [2311]: info: standby: acquire [all]
resources from mq2
May 17 21:42:11 mq1 heartbeat: [2311]: debug: Process [go_standby] started
pid 2612
May 17 21:42:11 mq1 heartbeat: [2311]: info: New standby state: 3
May 17 21:42:11 mq1 heartbeat: [2612]: info: acquire all HA resources
(standby).
May 17 21:42:11 mq1 heartbeat: [2612]: info: go_standby: who: 2 resource
set: all
May 17 21:42:11 mq1 heartbeat: [2612]: info: go_standby: (query/action):
(allkeys/takegroup)
ResourceManager[2625]:  2012/05/17_21:42:11 info: Acquiring resource group:
mq1 172.19.253.99/16/eth0 drbddisk::drbd1
Filesystem::/dev/drbd1::/media/drbd1 rabbitmq-server
IPaddr[2654]:   2012/05/17_21:42:11 INFO:  Resource is stopped
ResourceManager[2625]:  2012/05/17_21:42:11 info: Running
/etc/ha.d/resource.d/IPaddr 172.19.253.99/16/eth0 start
ResourceManager[2625]:  2012/05/17_21:42:11 debug: Starting
/etc/ha.d/resource.d/IPaddr 172.19.253.99/16/eth0 start
IPaddr[2748]:   2012/05/17_21:42:11 INFO: Using calculated netmask for
172.19.253.99: 255.255.0.0
IPaddr[2748]:   2012/05/17_21:42:11 DEBUG: Using calculated broadcast for
172.19.253.99: 172.19.255.255
IPaddr[2748]:   2012/05/17_21:42:11 INFO: eval ifconfig eth0:0 172.19.253.99
netmask 255.255.0.0 broadcast 172.19.255.255
IPaddr[2748]:   2012/05/17_21:42:11 DEBUG: Sending Gratuitous Arp for
172.19.253.99 on eth0:0 [eth0]
IPaddr[2722]:   2012/05/17_21:42:12 INFO:  Success
INFO:  Success
ResourceManager[2625]:  2012/05/17_21:42:12 debug:
/etc/ha.d/resource.d/IPaddr 172.19.253.99/16/eth0 start done. RC=0
ResourceManager[2625]:  2012/05/17_21:42:12 info: Running
/etc/ha.d/resource.d/drbddisk drbd1 start
ResourceManager[2625]:  2012/05/17_21:42:12 debug: Starting
/etc/ha.d/resource.d/drbddisk drbd1 start
ResourceManager[2625]:  2012/05/17_21:42:12 debug:
/etc/ha.d/resource.d/drbddisk drbd1 start done. RC=0
Filesystem[2916]:       2012/05/17_21:42:12 INFO:  Resource is stopped
ResourceManager[2625]:  2012/05/17_21:42:12 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /media/drbd1 start
ResourceManager[2625]:  2012/05/17_21:42:12 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /media/drbd1 start
Filesystem[2999]:       2012/05/17_21:42:12 INFO: Running start for
/dev/drbd1 on /media/drbd1
Filesystem[2999]:       2012/05/17_21:42:12 INFO: Starting filesystem check
on /dev/drbd1
fsck 1.39 (29-May-2006)
/dev/drbd1: clean, 442/2621440 files, 127224/5242683 blocks
Filesystem[2991]:       2012/05/17_21:42:13 INFO:  Success
INFO:  Success
ResourceManager[2625]:  2012/05/17_21:42:13 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /media/drbd1 start done. RC=0
May 17 21:42:13 mq1 heartbeat: [2612]: info: all HA resource acquisition
completed (standby).
May 17 21:42:13 mq1 heartbeat: [2612]: debug: Sending standby [done] msg
May 17 21:42:13 mq1 heartbeat: [2612]: info: FIFO message [type
ask_resources] written rc=47
May 17 21:42:13 mq1 heartbeat: [2311]: debug: Received standby message done
from mq1 in state 3
May 17 21:42:13 mq1 heartbeat: [2311]: info: Standby resource acquisition
done [all].
May 17 21:42:13 mq1 heartbeat: [2311]: debug: Sending hold resources msg:
all, stable=1 # <none>
May 17 21:42:13 mq1 heartbeat: [2311]: info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
May 17 21:42:13 mq1 heartbeat: [2311]: debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
May 17 21:42:13 mq1 heartbeat: [2311]: info: New standby state: 0
May 17 21:42:13 mq1 heartbeat: [2311]: info: Managed go_standby process 2612
exited with return code 0.
May 17 21:42:13 mq1 heartbeat: [2311]: debug: RscMgmtProc 'go_standby'
exited code 0
May 17 21:42:13 mq1 heartbeat: [2311]: info: remote resource transition
completed.
May 17 21:42:13 mq1 heartbeat: [2311]: debug: Sending hold resources msg:
all, stable=1 # <none>
May 17 21:42:13 mq1 heartbeat: [2311]: info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (1))
May 17 21:42:13 mq1 heartbeat: [2311]: debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
May 17 21:42:13 mq1 heartbeat: [2311]: debug: Calling PerformAutoFailback()
May 17 21:42:13 mq1 heartbeat: [2311]: info: other_holds_resources: 0
May 17 21:42:13 mq1 heartbeat: [2311]: debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
May 17 21:42:13 mq1 heartbeat: [2311]: info: other_holds_resources: 0
May 17 21:42:13 mq1 heartbeat: [2311]: debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
May 17 21:42:13 mq1 ipfail: [2341]: debug: Other side is now stable.
May 17 21:42:13 mq1 ipfail: [2341]: debug: Other side is now stable.
ARPING 172.19.253.99 from 172.19.253.99 eth0
Sent 10 probes (10 broadcast(s))
Received 0 response(s)


the same environment setting at rabbitmq 2.7.1 run heartbeat takeover,will
show rabbitmq start

May 17 18:09:26 mqhand1 heartbeat: [9216]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
harc[9216]:     2012/05/17_18:09:27 info: Running
/etc/ha.d//rc.d/hb_takeover hb_takeover
May 17 18:09:27 mqhand1 heartbeat: [2216]: info: mqhand2 wants to go standby
[all]
May 17 18:09:27 mqhand1 ipfail: [2243]: debug: Other side is unstable.
May 17 18:09:30 mqhand1 heartbeat: [2216]: info: standby: acquire [all]
resources from mqhand2
May 17 18:09:30 mqhand1 heartbeat: [9233]: info: acquire all HA resources
(standby).
ResourceManager[9246]:  2012/05/17_18:09:31 info: Acquiring resource group:
mqhand1 172.19.253.100/24/eth0 drbddisk::drbd1
Filesystem::/dev/drbd1::/media/drbd1 rabbitmq-server
IPaddr[9274]:   2012/05/17_18:09:31 INFO:  Resource is stopped
ResourceManager[9246]:  2012/05/17_18:09:31 info: Running
/etc/ha.d/resource.d/IPaddr 172.19.253.100/24/eth0 start
IPaddr[9361]:   2012/05/17_18:09:31 INFO: Using calculated netmask for
172.19.253.100: 255.255.255.0
IPaddr[9361]:   2012/05/17_18:09:31 INFO: eval ifconfig eth0:0
172.19.253.100 netmask 255.255.255.0 broadcast 172.19.253.255
IPaddr[9335]:   2012/05/17_18:09:31 INFO:  Success
INFO:  Success
ResourceManager[9246]:  2012/05/17_18:09:31 info: Running
/etc/ha.d/resource.d/drbddisk drbd1 start
Filesystem[9498]:       2012/05/17_18:09:31 INFO:  Resource is stopped
ResourceManager[9246]:  2012/05/17_18:09:31 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /media/drbd1 start
Filesystem[9577]:       2012/05/17_18:09:31 INFO: Running start for
/dev/drbd1 on /media/drbd1
Filesystem[9577]:       2012/05/17_18:09:31 INFO: Starting filesystem check
on /dev/drbd1
fsck 1.39 (29-May-2006)
/dev/drbd1: clean, 863/10485760 files, 380283/20970783 blocks (check in 4
mounts)
Filesystem[9569]:       2012/05/17_18:09:31 INFO:  Success
INFO:  Success
ResourceManager[9246]:  2012/05/17_18:09:31 info: Running
/etc/init.d/rabbitmq-server  start
Starting rabbitmq-server: SUCCESS
rabbitmq-server.
May 17 18:09:36 mqhand1 heartbeat: [9233]: info: all HA resource acquisition
completed (standby).
May 17 18:09:36 mqhand1 heartbeat: [2216]: info: Standby resource
acquisition done [all].
May 17 18:09:36 mqhand1 heartbeat: [2216]: info: remote resource transition
completed.
May 17 18:09:36 mqhand1 ipfail: [2243]: debug: Other side is now stable.
May 17 18:09:36 mqhand1 ipfail: [2243]: debug: Other side is now stable.
ARPING 172.19.253.100 from 172.19.253.100 eth0
Sent 10 probes (10 broadcast(s))
Received 0 response(s)

so i'm confuse in the same environment setting,why rabbitmq 2.8.2 can't
start with heartbeat takeover?


Emile Joubert-2 wrote:
> 
> Hi,
> 
> On 17/05/12 03:43, askgax wrote:
>> i build a simple highly available environment in rabbitmq 2.7.1/Erlang
>> R14B03 with heartbeat 3.0.4/DRBD 8.3.11,and it's work fine and stable,
>> last  week i upgrade rabbitmq to 2.8.2 
> 
> [...]
> 
> Was there an error message in the broker logfile about the disk free
> space limit? Do you have less free disk space than RAM on the partition
> storing RabbitMQ data? If so then you should configure the disk free
> limit to a less conservative value.
> 
> If disk free space is not the cause then the logfile should still
> indicate the cause. Does it contain any error entries at startup?
> 
> 
> -Emile
> 
> 
> 
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 
> 

-- 
View this message in context: http://old.nabble.com/Rabbitmq-2.8.2-with-heartbeat-DRBD-fail-tp33863293p33864330.html
Sent from the RabbitMQ mailing list archive at Nabble.com.



More information about the rabbitmq-discuss mailing list