[rabbitmq-discuss] rabbitmq_status_web crashed on daylight savings time switchover, took down RabbitMQ

Matthew Sackman matthew at lshift.net
Mon Mar 15 22:23:45 GMT 2010


Hi Greg,

On Mon, Mar 15, 2010 at 12:10:08PM -0700, Greg Campbell wrote:
> Let me
> know if you could use any other information.  As mentioned, this happened
> across multiple machines (QA, staging, production) simultaneously, and in
> all cases RabbitMQ shut down.  We're using version 1.7.2.

We did eventually trace this down. Rabbit would have done a controlled
shutdown so there was never any risk of persistent messages being lost.
It's difficult to see what should happen: the status plugin crashed, and
restarted and crashed again fast enough to convince Erlang that this
wasn't an error that could be recovered from. Erlang then shut down all
the other applications, which is what it's meant to do. We could change
this latter behaviour, but then again, that suggests that you could
classify some plugins as "crucial" (i.e. if they go down, you do want
everything else to go down too) and others as "non-crucial". But as soon
as you make such a suggestion, you rather wonder a) why are you running
plugins that are "non-crucial" - if you don't need them, then don't run
them; and b) if you found that the status plugin had crashed and had
been shutdown, but the rest of rabbit had survived, what would you have
then done? Surely restarting Rabbit would have been your next step?

I think really what this is getting towards is that we need a way of
dynamically starting and stopping plugins in a running Rabbit instance.
This may not be possible with all plugins - some hook into the boot
sequence of Rabbit very early on and so couldn't be started up once
Rabbit is up and running, but others could potentially be started and
stopped and restarted dynamically.

Matthew




More information about the rabbitmq-discuss mailing list