[rabbitmq-discuss] RabbitMQ internal error

Tue Mar 4 13:41:09 GMT 2008

Michael

Python is perfect.  I was thinking of client use cases for set up,
destroy after time out, etc.  This would be useful for folks like John
I think...

alexis

On Tue, Mar 4, 2008 at 12:20 PM, Michael Arnoldus <chime at mu.dk> wrote:
> Alexis,
>
>  Yes, that might be possible. Not sure what you think would be useful
>  though, could you elaborate a bit?
>
>  Everything we have done has been done in Python (so far at least) and
>  not in erlang - just so you know :-)
>
>  Regards,
>
>  Michael
>
>
>
>  On Mar 4, 2008, at 12:42 , Alexis Richardson wrote:
>
>  > Michael
>  >
>  > Some of your client code might be useful for creating some of a
>  > management tool.  Might it be separable from your proprietary code,
>  > and sharable in any form?
>  >
>  > alexis
>  >
>  >
>  > On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime at mu.dk> wrote:
>  >> Found the problem.
>  >>
>  >> Among other things we use AMQP/RabbitMQ as a transport for RPC style
>  >> calls. A fast way to implement this was to create a new anonymous
>  >> queue for the expected reply and then send the queue name in the
>  >> 'reply to' field. We did and it worked, however we forgot two things:
>  >> 1. Destroy the queue after it was used and 2. set the queue to auto-
>  >> delete so if the module actually crashes, the queues gets deleted
>  >> anyway. We have a watch-dog functionality that will ping all our AMQP
>  >> modules, and in case of no reply (over some time) it will kill the
>  >> module and make it restart.
>  >>
>  >> So when we ran out of queues RabbitMQ simply stopped responding,
>  >> causing the watch-dog to kill the AMQP modules, they will restart and
>  >> try again, ....
>  >>
>  >> The result was a heap of clients and a heap of queues.
>  >>
>  >> The fix was 3 things: Set RPC reply queues to auto-delete, destroy
>  >> them actively after use or timeout, modify the watchdog so it wont
>  >> kill anything unless it's actually able to ping itself through AMQP.
>  >>
>  >> Now everything works with a steady queue count.
>  >>
>  >> Thank to Tony for all the help in finding this bug. Your support is
>  >> awesome!!!
>  >>
>  >> Regards,
>  >>
>  >> Michael Arnoldus
>  >>
>  >>
>  >> On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
>  >>
>  >>
>  >>
>  >>> Hi Michael,
>  >>>
>  >>> Michael Arnoldus wrote:
>  >>>> Yesterday we experienced another problem with RabbitMQ. Possibly
>  >>>> still our own fault, but this time a bit more severe. Suddenly from
>  >>>> out of the blue it was impossible to send a single message through
>  >>>> Rabbit. Even restart of the components connecting to rabbit didn't
>  >>>> help. The erlang process stayed but didn't seem to work. Killing
>  >>>> the beam process helped and everything returned to normal.
>  >>>
>  >>> This is extremely interesting.
>  >>>
>  >>> - What architecture are you running on? Is it a Mac?
>  >>> - Was the CPU pinned to 100%?
>  >>> - Were you able to issue commands at the Erlang prompt in the
>  >>> server?
>  >>>
>  >>> We are tracking down what we suspect to be a Mac-specific bug in the
>  >>> Erlang runtime that manifests in some corner-cases of socket
>  >>> shutdown - it would be interesting if you have detected the same
>  >>> thing we're chasing. (We are still in the early stages of our
>  >>> investigation - we can't say for sure yet whether it's really a
>  >>> runtime problem.)
>  >>>
>  >>>> In a log file we had:
>  >>>> ERROR    2008-02-26 16:17:32,857   --call got Closed:
>  >>>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
>  >>>> None
>  >>>
>  >>> If only the other log files hadn't been stomped on by the broker
>  >>> startup! Your message has prompted us to fix this bad behaviour - we
>  >>> have changed the startup scripts to move existing log files out of
>  >>> the way, keeping the most recent few files.
>  >>>
>  >>> The INTERNAL_ERROR message is very interesting, because it indicates
>  >>> a real bug in the broker. We don't see it in the case of the Mac bug
>  >>> I mentioned earlier, so you might have found something different.
>  >>>
>  >>> This is probably the code that ran:
>  >>>
>  >>> lookup_amqp_exception(Other) ->
>  >>>   rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
>  >>>   {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>  >>>
>  >>> ... which produces a "Non-AMQP exit reason" message in the log. I'm
>  >>> afraid without that message, we'll have a tough time diagnosing this
>  >>> one.
>  >>>
>  >>> Regards,
>  >>> Tony
>  >>
>  >>
>  >> _______________________________________________
>  >> rabbitmq-discuss mailing list
>  >> rabbitmq-discuss at lists.rabbitmq.com
>  >> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>  >>
>  >>
>  >
>  >
>  >
>  > --
>  > Alexis Richardson
>  > +44 20 7617 7339 (UK)
>  > +44 77 9865 2911 (cell)
>  > +1 650 206 2517 (US)
>
>

-- 
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)