[rabbitmq-discuss] RabbitMQ internal error

Tue Mar 4 11:42:15 GMT 2008

Michael

Some of your client code might be useful for creating some of a
management tool.  Might it be separable from your proprietary code,
and sharable in any form?

alexis

On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime at mu.dk> wrote:
> Found the problem.
>
>  Among other things we use AMQP/RabbitMQ as a transport for RPC style
>  calls. A fast way to implement this was to create a new anonymous
>  queue for the expected reply and then send the queue name in the
>  'reply to' field. We did and it worked, however we forgot two things:
>  1. Destroy the queue after it was used and 2. set the queue to auto-
>  delete so if the module actually crashes, the queues gets deleted
>  anyway. We have a watch-dog functionality that will ping all our AMQP
>  modules, and in case of no reply (over some time) it will kill the
>  module and make it restart.
>
>  So when we ran out of queues RabbitMQ simply stopped responding,
>  causing the watch-dog to kill the AMQP modules, they will restart and
>  try again, ....
>
>  The result was a heap of clients and a heap of queues.
>
>  The fix was 3 things: Set RPC reply queues to auto-delete, destroy
>  them actively after use or timeout, modify the watchdog so it wont
>  kill anything unless it's actually able to ping itself through AMQP.
>
>  Now everything works with a steady queue count.
>
>  Thank to Tony for all the help in finding this bug. Your support is
>  awesome!!!
>
>  Regards,
>
>  Michael Arnoldus
>
>
>  On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
>
>
>
> > Hi Michael,
>  >
>  > Michael Arnoldus wrote:
>  >> Yesterday we experienced another problem with RabbitMQ. Possibly
>  >> still our own fault, but this time a bit more severe. Suddenly from
>  >> out of the blue it was impossible to send a single message through
>  >> Rabbit. Even restart of the components connecting to rabbit didn't
>  >> help. The erlang process stayed but didn't seem to work. Killing
>  >> the beam process helped and everything returned to normal.
>  >
>  > This is extremely interesting.
>  >
>  > - What architecture are you running on? Is it a Mac?
>  > - Was the CPU pinned to 100%?
>  > - Were you able to issue commands at the Erlang prompt in the server?
>  >
>  > We are tracking down what we suspect to be a Mac-specific bug in the
>  > Erlang runtime that manifests in some corner-cases of socket
>  > shutdown - it would be interesting if you have detected the same
>  > thing we're chasing. (We are still in the early stages of our
>  > investigation - we can't say for sure yet whether it's really a
>  > runtime problem.)
>  >
>  >> In a log file we had:
>  >> ERROR    2008-02-26 16:17:32,857   --call got Closed:
>  >> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
>  >> None
>  >
>  > If only the other log files hadn't been stomped on by the broker
>  > startup! Your message has prompted us to fix this bad behaviour - we
>  > have changed the startup scripts to move existing log files out of
>  > the way, keeping the most recent few files.
>  >
>  > The INTERNAL_ERROR message is very interesting, because it indicates
>  > a real bug in the broker. We don't see it in the case of the Mac bug
>  > I mentioned earlier, so you might have found something different.
>  >
>  > This is probably the code that ran:
>  >
>  > lookup_amqp_exception(Other) ->
>  >    rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
>  >    {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
>  >
>  > ... which produces a "Non-AMQP exit reason" message in the log. I'm
>  > afraid without that message, we'll have a tough time diagnosing this
>  > one.
>  >
>  > Regards,
>  >  Tony
>
>
> _______________________________________________
>  rabbitmq-discuss mailing list
>  rabbitmq-discuss at lists.rabbitmq.com
>  http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>

-- 
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)