[rabbitmq-discuss] rabbitmq_status_web crashed on daylight savings time switchover, took down RabbitMQ

Matthew Sackman matthew at lshift.net
Mon Mar 15 16:29:30 GMT 2010


Hi Greg,

On Sun, Mar 14, 2010 at 07:45:35PM -0700, Greg Campbell wrote:
> ** Generic server rabbit_status_web terminating
> ** Last message in was get_context
> ** When Server state == {state,1268531993140,"Sun, 14 Mar 2010 09:59:53
> GMT",
> 
> ... a bunch of queue status data...
> 
> ** Reason for termination ==
> ** {{case_clause,[]},
>     [{httpd_util,rfc1123_date,1},
>      {rabbit_status_web,internal_update,1},
>      {rabbit_status_web,handle_call,3},
>      {gen_server,handle_msg,5},
>      {proc_lib,init_p_do_apply,3}]}

This doesn't indicate that Rabbit itself crashed, only that the
rabbit_status_web process crashed. That process is under a supervisor
hierarchy and should have been automatically restarted.

I have reproduced this (in that I can get a very similar crash to happen,
though without messing with my clock), and whilst similar entries in the
logs appear, the rabbit_status_web process restarts correctly, and
Rabbit itself continues on, perfectly happily.

> The underlying issue in httpd_util:rfc1123_date appears to be an Erlang bug,
> which I believe has been fixed in R13B04 (we're still running R13B03 on the
> server, though):
> http://www.erlang.org/cgi-bin/ezmlm-cgi?2:mss:1681:201001:jffdfifdokimdicjnpcp
> .

Interesting bug, and I'm glad it's now been fixed.

> However, it might be worth ensuring that a crash in a plugin doesn't cause
> the entire system to shut down.

We don't think it did ;) Are there any more entries in the logs that you
have that indicate anything else went wrong? It's possible the same bug
affected more than just the plugin, but certainly the entry you showed
us just shows a process within the status plugin going down which should
have been immediately restarted.

Matthew




More information about the rabbitmq-discuss mailing list