[rabbitmq-discuss] RabbitMQ internal error
Alexis Richardson
alexis.richardson at cohesiveft.com
Tue Mar 4 13:41:09 GMT 2008
Michael
Python is perfect. I was thinking of client use cases for set up,
destroy after time out, etc. This would be useful for folks like John
I think...
alexis
On Tue, Mar 4, 2008 at 12:20 PM, Michael Arnoldus <chime at mu.dk> wrote:
> Alexis,
>
> Yes, that might be possible. Not sure what you think would be useful
> though, could you elaborate a bit?
>
> Everything we have done has been done in Python (so far at least) and
> not in erlang - just so you know :-)
>
> Regards,
>
> Michael
>
>
>
> On Mar 4, 2008, at 12:42 , Alexis Richardson wrote:
>
> > Michael
> >
> > Some of your client code might be useful for creating some of a
> > management tool. Might it be separable from your proprietary code,
> > and sharable in any form?
> >
> > alexis
> >
> >
> > On Tue, Mar 4, 2008 at 11:01 AM, Michael Arnoldus <chime at mu.dk> wrote:
> >> Found the problem.
> >>
> >> Among other things we use AMQP/RabbitMQ as a transport for RPC style
> >> calls. A fast way to implement this was to create a new anonymous
> >> queue for the expected reply and then send the queue name in the
> >> 'reply to' field. We did and it worked, however we forgot two things:
> >> 1. Destroy the queue after it was used and 2. set the queue to auto-
> >> delete so if the module actually crashes, the queues gets deleted
> >> anyway. We have a watch-dog functionality that will ping all our AMQP
> >> modules, and in case of no reply (over some time) it will kill the
> >> module and make it restart.
> >>
> >> So when we ran out of queues RabbitMQ simply stopped responding,
> >> causing the watch-dog to kill the AMQP modules, they will restart and
> >> try again, ....
> >>
> >> The result was a heap of clients and a heap of queues.
> >>
> >> The fix was 3 things: Set RPC reply queues to auto-delete, destroy
> >> them actively after use or timeout, modify the watchdog so it wont
> >> kill anything unless it's actually able to ping itself through AMQP.
> >>
> >> Now everything works with a steady queue count.
> >>
> >> Thank to Tony for all the help in finding this bug. Your support is
> >> awesome!!!
> >>
> >> Regards,
> >>
> >> Michael Arnoldus
> >>
> >>
> >> On Feb 28, 2008, at 8:52 , Tony Garnock-Jones wrote:
> >>
> >>
> >>
> >>> Hi Michael,
> >>>
> >>> Michael Arnoldus wrote:
> >>>> Yesterday we experienced another problem with RabbitMQ. Possibly
> >>>> still our own fault, but this time a bit more severe. Suddenly from
> >>>> out of the blue it was impossible to send a single message through
> >>>> Rabbit. Even restart of the components connecting to rabbit didn't
> >>>> help. The erlang process stayed but didn't seem to work. Killing
> >>>> the beam process helped and everything returned to normal.
> >>>
> >>> This is extremely interesting.
> >>>
> >>> - What architecture are you running on? Is it a Mac?
> >>> - Was the CPU pinned to 100%?
> >>> - Were you able to issue commands at the Erlang prompt in the
> >>> server?
> >>>
> >>> We are tracking down what we suspect to be a Mac-specific bug in the
> >>> Erlang runtime that manifests in some corner-cases of socket
> >>> shutdown - it would be interesting if you have detected the same
> >>> thing we're chasing. (We are still in the early stages of our
> >>> investigation - we can't say for sure yet whether it's really a
> >>> runtime problem.)
> >>>
> >>>> In a log file we had:
> >>>> ERROR 2008-02-26 16:17:32,857 --call got Closed:
> >>>> Method(name=close, id=60) (541, 'INTERNAL_ERROR', 0, 0) content =
> >>>> None
> >>>
> >>> If only the other log files hadn't been stomped on by the broker
> >>> startup! Your message has prompted us to fix this bad behaviour - we
> >>> have changed the startup scripts to move existing log files out of
> >>> the way, keeping the most recent few files.
> >>>
> >>> The INTERNAL_ERROR message is very interesting, because it indicates
> >>> a real bug in the broker. We don't see it in the case of the Mac bug
> >>> I mentioned earlier, so you might have found something different.
> >>>
> >>> This is probably the code that ran:
> >>>
> >>> lookup_amqp_exception(Other) ->
> >>> rabbit_log:warning("Non-AMQP exit reason '~p'~n", [Other]),
> >>> {true, ?INTERNAL_ERROR, <<"INTERNAL_ERROR">>, none}.
> >>>
> >>> ... which produces a "Non-AMQP exit reason" message in the log. I'm
> >>> afraid without that message, we'll have a tough time diagnosing this
> >>> one.
> >>>
> >>> Regards,
> >>> Tony
> >>
> >>
> >> _______________________________________________
> >> rabbitmq-discuss mailing list
> >> rabbitmq-discuss at lists.rabbitmq.com
> >> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> >>
> >>
> >
> >
> >
> > --
> > Alexis Richardson
> > +44 20 7617 7339 (UK)
> > +44 77 9865 2911 (cell)
> > +1 650 206 2517 (US)
>
>
--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)
More information about the rabbitmq-discuss
mailing list