[rabbitmq-discuss] RabbitMQ internal error
Michael Arnoldus
chime at mu.dk
Tue Mar 4 12:20:55 GMT 2008
Alexis,
Yes, that might be possible. Not sure what you think would be useful
though, could you elaborate a bit?
Everything we have done has been done in Python (so far at least) and
not in erlang - just so you know :-)
Regards,
Michael
On Mar 4, 2008, at 12:42 , Alexis Richardson wrote:
> Michael
>
> Some of your client code might be useful for creating some of a
> management tool. Might it be separable from your proprietary code,
> and sharable in any form?
>
> alexis
>
>
> On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime at mu.dk> wrote:
>> Found the problem.
>>
>> Among other things we use AMQP/RabbitMQ as a transport for RPC style
>> calls. A fast way to implement this was to create a new anonymous
>> queue for the expected reply and then send the queue name in the
>> 'reply to' field. We did and it worked, however we forgot two things:
>> 1. Destroy the queue after it was used and 2. set the queue to auto-
>> delete so if the module actually crashes, the queues gets deleted
>> anyway. We have a watch-dog functionality that will ping all our AMQP
>> modules, and in case of no reply (over some time) it will kill the
>> module and make it restart.
>>
>> So when we ran out of queues RabbitMQ simply stopped responding,
>> causing the watch-dog to kill the AMQP modules, they will restart and
>> try again, ....
>>
>> The result was a heap of clients and a heap of queues.
>>
>> The fix was 3 things: Set RPC reply queues to auto-delete, destroy
>> them actively after use or timeout, modify the watchdog so it wont
>> kill anything unless it's actually able to ping itself through AMQP.
>>
>> Now everything works with a steady queue count.
>>
>> Thank to Tony for all the help in finding this bug. Your support is
>> awesome!!!
>>
>> Regards,
>>
>> Michael Arnoldus
>>
>>
>> On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
>>
>>
>>
>>> Hi Michael,
>>>
>>> Michael Arnoldus wrote:
>>>> Yesterday we experienced another problem with RabbitMQ. Possibly
>>>> still our own fault, but this time a bit more severe. Suddenly from
>>>> out of the blue it was impossible to send a single message through
>>>> Rabbit. Even restart of the components connecting to rabbit didn't
>>>> help. The erlang process stayed but didn't seem to work. Killing
>>>> the beam process helped and everything returned to normal.
>>>
>>> This is extremely interesting.
>>>
>>> - What architecture are you running on? Is it a Mac?
>>> - Was the CPU pinned to 100%?
>>> - Were you able to issue commands at the Erlang prompt in the
>>> server?
>>>
>>> We are tracking down what we suspect to be a Mac-specific bug in the
>>> Erlang runtime that manifests in some corner-cases of socket
>>> shutdown - it would be interesting if you have detected the same
>>> thing we're chasing. (We are still in the early stages of our
>>> investigation - we can't say for sure yet whether it's really a
>>> runtime problem.)
>>>
>>>> In a log file we had:
>>>> ERROR 2008-02-26 16:17:32,857 --call got Closed:
>>>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
>>>> None
>>>
>>> If only the other log files hadn't been stomped on by the broker
>>> startup! Your message has prompted us to fix this bad behaviour - we
>>> have changed the startup scripts to move existing log files out of
>>> the way, keeping the most recent few files.
>>>
>>> The INTERNAL_ERROR message is very interesting, because it indicates
>>> a real bug in the broker. We don't see it in the case of the Mac bug
>>> I mentioned earlier, so you might have found something different.
>>>
>>> This is probably the code that ran:
>>>
>>> lookup_amqp_exception(Other) ->
>>> rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
>>> {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>>>
>>> ... which produces a "Non-AMQP exit reason" message in the log. I'm
>>> afraid without that message, we'll have a tough time diagnosing this
>>> one.
>>>
>>> Regards,
>>> Tony
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>
>
>
> --
> Alexis Richardson
> +44 20 7617 7339 (UK)
> +44 77 9865 2911 (cell)
> +1 650 206 2517 (US)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1912 bytes
Desc: not available
Url : http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20080304/d970bb10/attachment.bin
More information about the rabbitmq-discuss
mailing list