[rabbitmq-discuss] Weird Crash (91MB message over STOMP) [Reproducible]

Sat Aug 8 09:46:53 BST 2009

>>>> Okay, so after enabling verbose logging, I was able to replicate the
>>>> error, reliably.
>>>
>>> including the "messages show up as unacknowledged after restart" problem?
>>
>> YES.  I think this problem is also STOMP specific!  After recovering
>> the persister from the last crash.  I start up a single STOMP client
>> and attempt to subscribe and get the first message off the queue.  At
>> that time, rabbit.log generates this error:
>
> Hang on. At what point do you see the unacknowledged messages? Right after
> the restart? (and how? with 'rabbitmqctl list_queues'?) Or after you start
> up that single STOMP client?
>
> I am asking because when I recover your latest db dir into my rabbit
> instance I see three *ready* messages in the '500.manager.workers' queue.

I know what you mean.
I see the 'unacknowledged messages' after a start up the STOMP
clients.  So, I'm thinking the order of operations is:

1) Unacknowledged messages exist on the queue
2) RabbitMQ dies
3) RabbitMQ starts up
4) Recovery mode starts, marks all un-ack'd messages as 'ready'
5) STOMP clients connect
6) RabbitMQ generates the STOMP error
7) I check the rabbitmqctl output, and see that there are un-ack'd messages

To be honest, I can't seem to replicate the issue where the STOMP
clients disconnect and the messages remain 'un-ack'd' -- I'm thinking
this error may be transient or somehow a wierd corner case.  If I ever
encounter that scenario again, I'll be sure to save the mnesia
directory at that point.

>> =ERROR REPORT==== 8-Aug-2009::03:40:01 ===
>> STOMP Reply command unhandled: {'basic.deliver',
>>                                   <<"Q_500.manager.workers">>,
>>                                   1,
>>                                   false,
>>                                   <<"events">>,
>>
>> <<"500.job.create.job.urls.job_alerts">>}
>> {content,60,
>>         none,
>> ... followed by the entire message contents...
>
> That may well be a bug. Can you send me just a little bit more of the above
> error? Another k or so should do.

Okay, I'll send you another direct email with an attachment of the log.

>>> You say rabbit died with zero logging. That may well be true, but rabbit
>>>  *did* produce a crash dump, and that should allow us to establish the
>>> cause
>>> of death.
>>
>> Okay, good to know.
>
> It ran out of memory, which is what I suspected.
>
>> This is interesting. So I resent the large message with 'delivery-mode' of
>> '1' instead of '2' (which should make it NOT persistent -- right?). RabbitMQ
>> still dies.
>
> Interesting indeed. Yes, that means the persister is unlikely to be the
> causing the OoM.

I don't have a pure AMQP test client, but I'm curious if this error
condition exists if the large message were sent over AMQP instead of
STOMP...

-- Darien