[rabbitmq-discuss] Weird Crash - Recovery logic for durable messages/queues/exchanges?
Darien Kindlund
darien at kindlund.com
Fri Aug 7 06:39:38 BST 2009
>> So after running RabbitMQ v1.6.0 for awhile, I've encountered a
>> strange crash, where the server unexpectedly dies with no crash report
>> or any applicable log information written to disk. I'm trying to see
>> if I can replicate the issue, but in the meantime, when I recover the
>> server, it dutifully recovers all my messages, queues, exchanges, and
>> bindings (great!). However, once the server recovers, all the durable
>> messages in the queues (from rabbitmqctl) are still marked as
>> *messages_unacknowledged" -- not "messages_ready"... To my knowledge,
>> this means: "RabbitMQ thinks there is already an AMQP channel and
>> connection open which already has these messages -- and is simply
>> waiting for an ACK back from this AMQP consumer." ... The problem is:
>> when RabbitMQ recovers, all AMQP channels/connections are terminated,
>> so this assumption is clearly wrong (in this scenario).
>
> On recovery the persister requeues recovered messages. They should all be
> counted as 'ready' unless they are sent to a consumer, in which case they
> will show up as 'unacknowledged'.
Okay, your wording is a little vague, so I want to be crystal clear.
Assume we have a single RabbitMQ server, with 3 consumers, where all 3
are consuming messages off the same durable queue (with durable
messages). When all 3 start processing their respective messages,
RabbitMQ marks all 3 durable messages as 'unacknowledged'. Then,
let's assume RabbitMQ crashes (for some reason or another). *STICKING
POINT: Upon crash, all 3 consumers channels and connections have
terminated -- I assume there's no way for any of the consumers to
"reuse" their existing channels/connections because RabbitMQ server
died.*
Therefore, when a sysadmin restarts RabbitMQ and the persister is
recovered, will all 3 messages be marked 'ready' ? Or will all 3 be
marked 'unacknowledged' ? Sorry to be pedantic, but your original
reply was slightly unclear about this.
> So what you are seeing is rather strange. Are you sure there aren't any
> connected consumers?
I'm sure there are no connected consumers -- although I assume that
when RabbitMQ crashes, all consumer channels/connections are
terminated as well. For good measure, I also had to terminate and
restart epmd... otherwise, RabbitMQ would not start up properly via
'/etc/init.d/rabbitmq start'. FYI, this is on a stock Ubuntu
distribution.
> Also, are you running rabbit as a single node, or in a cluster?
I'm running RabbitMQ as a single node.
To test to see if these 'unacknowledged' messages could somehow get
reset, I have:
1) shutdown all consumer connections
2) started up a single consumer. Upon doing so, the consumer is NOT
able to fetch any of the un-ack'd messages -- although any new
messages do properly get delivered to the consumer
3) shutdown the single consumer
4) verified the un-ack'd messages still exist
5) started up a single consumer... same behavior as #2
As an interesting side case, is there any way to manually reset
un-ack'd messages back into the ready state while RabbitMQ is running
(and consumers/producers are active?). I'm trying to avoid having to
shutdown the RabbitMQ server and obliterate the nmesia persister log
in order to clear out these messages. (destroying and re-creating the
queues isn't ideal either, since I have active consumers processing
newer messages using these same queues).
-- Darien
More information about the rabbitmq-discuss
mailing list