[rabbitmq-discuss] Weird Crash - Recovery logic for durable messages/queues/exchanges?

Fri Aug 7 06:39:38 BST 2009

>> So after running RabbitMQ v1.6.0 for awhile, I've encountered a
>> strange crash, where the server unexpectedly dies with no crash report
>> or any applicable log information written to disk.  I'm trying to see
>> if I can replicate the issue, but in the meantime, when I recover the
>> server, it dutifully recovers all my messages, queues, exchanges, and
>> bindings (great!).  However, once the server recovers, all the durable
>> messages in the queues (from rabbitmqctl) are still marked as
>> *messages_unacknowledged" -- not "messages_ready"... To my knowledge,
>> this means: "RabbitMQ thinks there is already an AMQP channel and
>> connection open which already has these messages -- and is simply
>> waiting for an ACK back from this AMQP consumer."  ... The problem is:
>> when RabbitMQ recovers, all AMQP channels/connections are terminated,
>> so this assumption is clearly wrong (in this scenario).
>
> On recovery the persister requeues recovered messages. They should all be
> counted as 'ready' unless they are sent to a consumer, in which case they
> will show up as 'unacknowledged'.

Okay, your wording is a little vague, so I want to be crystal clear.
Assume we have a single RabbitMQ server, with 3 consumers, where all 3
are consuming messages off the same durable queue (with durable
messages).  When all 3 start processing their respective messages,
RabbitMQ marks all 3 durable messages as 'unacknowledged'.  Then,
let's assume RabbitMQ crashes (for some reason or another).  *STICKING
POINT: Upon crash, all 3 consumers channels and connections have
terminated -- I assume there's no way for any of the consumers to
"reuse" their existing channels/connections because RabbitMQ server
died.*

Therefore, when a sysadmin restarts RabbitMQ and the persister is
recovered, will all 3 messages be marked 'ready' ?  Or will all 3 be
marked 'unacknowledged' ?   Sorry to be pedantic, but your original
reply was slightly unclear about this.

> So what you are seeing is rather strange. Are you sure there aren't any
> connected consumers?

I'm sure there are no connected consumers -- although I assume that
when RabbitMQ crashes, all consumer channels/connections are
terminated as well.  For good measure, I also had to terminate and
restart epmd... otherwise, RabbitMQ would not start up properly via
'/etc/init.d/rabbitmq start'.  FYI, this is on a stock Ubuntu
distribution.

> Also, are you running rabbit as a single node, or in a cluster?

I'm running RabbitMQ as a single node.

To test to see if these 'unacknowledged' messages could somehow get
reset, I have:
1) shutdown all consumer connections
2) started up a single consumer.  Upon doing so, the consumer is NOT
able to fetch any of the un-ack'd messages -- although any new
messages do properly get delivered to the consumer
3) shutdown the single consumer
4) verified the un-ack'd messages still exist
5) started up a single consumer... same behavior as #2

As an interesting side case, is there any way to manually reset
un-ack'd messages back into the ready state while RabbitMQ is running
(and consumers/producers are active?).  I'm trying to avoid having to
shutdown the RabbitMQ server and obliterate the nmesia persister log
in order to clear out these messages.  (destroying and re-creating the
queues isn't ideal either, since I have active consumers processing
newer messages using these same queues).

-- Darien