[rabbitmq-discuss] BUG: Failing nodes with HA-queue stops message flow
Simon Lundström
simlu at su.se
Mon Oct 1 13:03:16 BST 2012
Hello!
As per requested by Emile on IRC:
We get stale messages in the queue when the node that is "master" in a
HA-queue gets restarted, details below:
We are running two nodes, ram01 and disk01, on 2.8.7.
ram01 is a ram node and, surprise surprise, disk01 is a disc node.
We are using the amqp gem on Ruby for producer and consumer
(sourcecode available on demand).
The producer connects to ram01 and creates a fanout exchange which is
durable. It then publishes X # of messages on the exchange.
The consumer connects to disk01 connects to disk01 and sets up an error
handler to reconnect if it gets an 320 error.
The consumer creates a durable queue which has x-ha-policy = all. It
when binds to the exchange.
The queue has disk01 as a node and ram01 as a mirror.
Reproduceable steps:
# Restart disk01 (via `rabbitmqctl stop` and start it up again).
# The consumer reconnects because it noticed the 320.
consumer log:
[connection.close] Reply code = 320, reply text = CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
[connection.close] Setting up a periodic reconnection timer...
# In the queue, ram01 is now node and disk01 is a mirror.
# Restart ram01 (via the same procedure as disk01)
# In the management web GUI under the queue (simlu.queue) the consumer is not listed as connected to the queue.
However, if I go to "Connections" I can see that the consumer is connected to disk01. According to the log the consumer is also connected.
# Publish message via the producer
# We now have 1 message "Ready" in the queue.
The consumer does not get any message.
# If we either:
* Restart disk01 or
* Restart consumer
the message gets delivered.
In this case, we restarted disk01.
# Message is delivered.
On ram01 we get this message in the log:
=ERROR REPORT==== 1-Oct-2012::13:55:35 ===
Discarding message {'$gen_call',{<0.173.0>,#Ref<0.0.0.967>},{notify_down,<6733.320.0>}} from <0.173.0> to <0.200.0> in an old incarnation (1) of this node (2)
Log output from disk01 and ram01 is attached.
Thanks!
- Simon
---
Simon Lundström
IT Services
Stockholm University
-------------- next part --------------
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
Mirrored-queue (queue 'simlu.queue' in vhost '/'): Slave <'rabbit at rabbitmq-test-ram01.it.su.se'.1.200.0> saw deaths of mirrors <'rabbit at rabbitmq-test-disk01.it.su.se'.2.250.0>
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
Mirrored-queue (queue 'simlu.queue' in vhost '/'): Promoting slave <'rabbit at rabbitmq-test-ram01.it.su.se'.1.200.0> to master
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
rabbit on node 'rabbit at rabbitmq-test-disk01.it.su.se' down
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
only running disc node went down
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
rabbit on node 'rabbit at rabbitmq-test-disk01.it.su.se' up
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
cluster contains disc nodes again
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
Stopping Rabbit
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
application: rabbitmq_management_agent
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
stopped SSL Listener on [::]:5671
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
application: rabbit
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
application: os_mon
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
application: mnesia
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
Halting Erlang VM
=INFO REPORT==== 1-Oct-2012::13:50:44 ===
Limiting to approx 924 file handles (829 sockets)
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
Memory limit set to 805MB of 2013MB total.
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
Disk free limit set to 1000MB
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
Management plugin upgraded statistics to fine.
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index
=WARNING REPORT==== 1-Oct-2012::13:50:45 ===
msg_store_persistent: recovery terms differ from present
rebuilding indices from scratch
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
Adding mirror of queue 'simlu.queue' in vhost '/' on node 'rabbit at rabbitmq-test-ram01.it.su.se': <0.200.0>
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
started SSL Listener on [::]:5671
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
rabbit on node 'rabbit at rabbitmq-test-disk01.it.su.se' up
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
cluster contains disc nodes again
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
Management agent started.
=INFO REPORT==== 1-Oct-2012::13:53:35 ===
accepting AMQP connection <0.253.0> (130.237.95.133:55630 -> 77.238.35.70:5671)
=INFO REPORT==== 1-Oct-2012::13:53:36 ===
closing AMQP connection <0.253.0> (130.237.95.133:55630 -> 77.238.35.70:5671)
=ERROR REPORT==== 1-Oct-2012::13:55:35 ===
Discarding message {'$gen_call',{<0.173.0>,#Ref<0.0.0.967>},{notify_down,<6733.320.0>}} from <0.173.0> to <0.200.0> in an old incarnation (1) of this node (2)
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
Mirrored-queue (queue 'simlu.queue' in vhost '/'): Slave <'rabbit at rabbitmq-test-ram01.it.su.se'.2.200.0> saw deaths of mirrors <'rabbit at rabbitmq-test-disk01.it.su.se'.3.250.0>
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
Mirrored-queue (queue 'simlu.queue' in vhost '/'): Promoting slave <'rabbit at rabbitmq-test-ram01.it.su.se'.2.200.0> to master
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
rabbit on node 'rabbit at rabbitmq-test-disk01.it.su.se' down
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
only running disc node went down
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
rabbit on node 'rabbit at rabbitmq-test-disk01.it.su.se' up
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
cluster contains disc nodes again
-------------- next part --------------
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
Stopping Rabbit
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
application: rabbitmq_management
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
application: rabbitmq_management_agent
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
stopped SSL Listener on [::]:5671
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
application: rabbit
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
application: os_mon
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
application: mnesia
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:50:10 ===
Halting Erlang VM
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Limiting to approx 924 file handles (829 sockets)
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Memory limit set to 805MB of 2013MB total.
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Disk free limit set to 1000MB
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Management plugin upgraded statistics to fine.
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index
=WARNING REPORT==== 1-Oct-2012::13:50:14 ===
msg_store_persistent: recovery terms differ from present
rebuilding indices from scratch
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Adding mirror of queue 'simlu.queue' in vhost '/' on node 'rabbit at rabbitmq-test-disk01.it.su.se': <0.250.0>
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
started SSL Listener on [::]:5671
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
rabbit on node 'rabbit at rabbitmq-test-ram01.it.su.se' up
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Management agent started.
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Management plugin started. Port: 443, path: /
=INFO REPORT==== 1-Oct-2012::13:50:14 ===
Statistics database started.
=INFO REPORT==== 1-Oct-2012::13:50:15 ===
accepting AMQP connection <0.315.0> (130.237.95.133:55602 -> 77.238.35.76:5671)
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
Mirrored-queue (queue 'simlu.queue' in vhost '/'): Slave <'rabbit at rabbitmq-test-disk01.it.su.se'.3.250.0> saw deaths of mirrors <'rabbit at rabbitmq-test-ram01.it.su.se'.1.200.0>
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
Mirrored-queue (queue 'simlu.queue' in vhost '/'): Promoting slave <'rabbit at rabbitmq-test-disk01.it.su.se'.3.250.0> to master
=INFO REPORT==== 1-Oct-2012::13:50:42 ===
rabbit on node 'rabbit at rabbitmq-test-ram01.it.su.se' down
=INFO REPORT==== 1-Oct-2012::13:50:45 ===
rabbit on node 'rabbit at rabbitmq-test-ram01.it.su.se' up
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
Stopping Rabbit
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
application: rabbitmq_management
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
application: rabbitmq_management_agent
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
stopped SSL Listener on [::]:5671
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
application: rabbit
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
application: os_mon
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
application: mnesia
exited: stopped
type: permanent
=INFO REPORT==== 1-Oct-2012::13:55:35 ===
Halting Erlang VM
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Limiting to approx 924 file handles (829 sockets)
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Memory limit set to 805MB of 2013MB total.
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Disk free limit set to 1000MB
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Management plugin upgraded statistics to fine.
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index
=WARNING REPORT==== 1-Oct-2012::13:55:39 ===
msg_store_persistent: recovery terms differ from present
rebuilding indices from scratch
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Adding mirror of queue 'simlu.queue' in vhost '/' on node 'rabbit at rabbitmq-test-disk01.it.su.se': <0.250.0>
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
started SSL Listener on [::]:5671
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
rabbit on node 'rabbit at rabbitmq-test-ram01.it.su.se' up
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Management agent started.
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Management plugin started. Port: 443, path: /
=INFO REPORT==== 1-Oct-2012::13:55:39 ===
Statistics database started.
=INFO REPORT==== 1-Oct-2012::13:55:40 ===
accepting AMQP connection <0.315.0> (130.237.95.133:55642 -> 77.238.35.76:5671)
More information about the rabbitmq-discuss
mailing list