[rabbitmq-discuss] RabbitMQ internal error

Michael Arnoldus chime at mu.dk
Tue Mar 4 11:01:52 GMT 2008


Found the problem.

Among other things we use AMQP/RabbitMQ as a transport for RPC style  
calls. A fast way to implement this was to create a new anonymous  
queue for the expected reply and then send the queue name in the  
'reply to' field. We did and it worked, however we forgot two things:  
1. Destroy the queue after it was used and 2. set the queue to auto- 
delete so if the module actually crashes, the queues gets deleted  
anyway. We have a watch-dog functionality that will ping all our AMQP  
modules, and in case of no reply (over some time) it will kill the  
module and make it restart.

So when we ran out of queues RabbitMQ simply stopped responding,  
causing the watch-dog to kill the AMQP modules, they will restart and  
try again, ....

The result was a heap of clients and a heap of queues.

The fix was 3 things: Set RPC reply queues to auto-delete, destroy  
them actively after use or timeout, modify the watchdog so it wont  
kill anything unless it's actually able to ping itself through AMQP.

Now everything works with a steady queue count.

Thank to Tony for all the help in finding this bug. Your support is  
awesome!!!

Regards,

Michael Arnoldus

On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:

> Hi Michael,
>
> Michael Arnoldus wrote:
>> Yesterday we experienced another problem with RabbitMQ. Possibly  
>> still our own fault, but this time a bit more severe. Suddenly from  
>> out of the blue it was impossible to send a single message through  
>> Rabbit. Even restart of the components connecting to rabbit didn't  
>> help. The erlang process stayed but didn't seem to work. Killing  
>> the beam process helped and everything returned to normal.
>
> This is extremely interesting.
>
> - What architecture are you running on? Is it a Mac?
> - Was the CPU pinned to 100%?
> - Were you able to issue commands at the Erlang prompt in the server?
>
> We are tracking down what we suspect to be a Mac-specific bug in the  
> Erlang runtime that manifests in some corner-cases of socket  
> shutdown - it would be interesting if you have detected the same  
> thing we're chasing. (We are still in the early stages of our  
> investigation - we can't say for sure yet whether it's really a  
> runtime problem.)
>
>> In a log file we had:
>> ERROR    2008-02-26 16:17:32,857   --call got Closed:  
>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =  
>> None
>
> If only the other log files hadn't been stomped on by the broker  
> startup! Your message has prompted us to fix this bad behaviour - we  
> have changed the startup scripts to move existing log files out of  
> the way, keeping the most recent few files.
>
> The INTERNAL_ERROR message is very interesting, because it indicates  
> a real bug in the broker. We don't see it in the case of the Mac bug  
> I mentioned earlier, so you might have found something different.
>
> This is probably the code that ran:
>
> lookup_amqp_exception(Other) ->
>    rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
>    {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>
> ... which produces a "Non-AMQP exit reason" message in the log. I'm  
> afraid without that message, we'll have a tough time diagnosing this  
> one.
>
> Regards,
>  Tony

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1912 bytes
Desc: not available
Url : http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20080304/e520f36a/attachment.bin 


More information about the rabbitmq-discuss mailing list