[rabbitmq-discuss] RabbitMQ stops responding for a few seconds, logs a "Context: shutdown_error", then resumes as if nothing happened

Matt Pietrek mpietrek at skytap.com
Mon Oct 15 18:50:20 BST 2012


Tim,

Unfortunately, the RabbitMQ logs are rather silent around the time of this
observed behavior. However, looking at other logs, I'm starting to suspect
that something triggered rsyslog (running on the same box) to send hundreds
of thousands of old log lines to another server, thus potentially choking
the ability of other processes (e.g. Rabbit) from responding during this
time.

I'll update this thread if I subsequently find something else beyond this.
For now, I'm inclined to call this closed.

On Mon, Oct 15, 2012 at 12:25 AM, Tim Watson <tim at rabbitmq.com> wrote:

> Hi Matt,
>
> Usual request for any scrubbed log(s) you can provide please! :)
>
> Also, if you could confirm that /api/queues every 5 seconds is the only
> non-amqp traffic the broker(s) will be subject to, that'd be quite useful
> to know as well.
>
> Cheers,
>
> Tim
>
>
> On 13 Oct 2012, at 00:43, Matt Pietrek wrote:
>
> > We've hit an odd circumstance in production that we can't make heads or
> tails of. Our setup is two RabbitMQ 2.8.6 nodes running clustered with all
> HA queues.
> >
> > In front of them is a VIP managed by a keepalive instances running on
> the same host as the RabbitMQ nodes. Every 5 seconds the keepalive instance
> runs a custom script that queries the node local broker via the HTTP API,
>  requesting the set of queues (/api/queues).
> >
> > From our logs, I can see that there was a single interval where the HTTP
> request returned a 404 error. (The time was the same on both brokers)
> Previous to this moment in time, all HTTP queries were successful, and
> after this moment in time, all queries were sucessful - As I said, just one
> blip on each broker.
> >
> > The only interesting thing we noticed in the logs was that the primary
> broker's rabbit at xxx-sasl file had this little snippet about 30 seconds
> after the hiccup occurred:
> >
> > =SUPERVISOR REPORT==== 12-Oct-2012::15:06:47 ===
> > Supervisor: {<0.25499.41>, rabbit_channel_sup_sup}
> > Context: shutdown_error
> > Reason: shutdown
> > Offender: [{pid,<0.25503.41>},
> > {name,channel_sup},
> > {mfa,{rabbit_channel_sup,start_link,[]}},
> > {restart_type,temporary},
> > {shutdown,infinity},
> > {child_type,supervisor}]
> >
> > I'm mostly at a loss to process what the snippet is telling me, nor can
> tell if the two things are related. Any help is appreciated!
> >
> > Thanks,
> >
> > Matt
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121015/32a8f171/attachment.htm>


More information about the rabbitmq-discuss mailing list