[rabbitmq-discuss] RabbitMQ stops responding for a few seconds, logs a "Context: shutdown_error", then resumes as if nothing happened

Tim Watson tim at rabbitmq.com
Mon Oct 15 20:50:11 BST 2012


Matt,

Ok no problem, but do let us know if anything like this shows up again!

Cheers,
Tim


On 15 Oct 2012, at 18:50, Matt Pietrek wrote:

> Tim,
> 
> Unfortunately, the RabbitMQ logs are rather silent around the time of this observed behavior. However, looking at other logs, I'm starting to suspect that something triggered rsyslog (running on the same box) to send hundreds of thousands of old log lines to another server, thus potentially choking the ability of other processes (e.g. Rabbit) from responding during this time.
> 
> I'll update this thread if I subsequently find something else beyond this. For now, I'm inclined to call this closed.
> 
> On Mon, Oct 15, 2012 at 12:25 AM, Tim Watson <tim at rabbitmq.com> wrote:
> Hi Matt,
> 
> Usual request for any scrubbed log(s) you can provide please! :)
> 
> Also, if you could confirm that /api/queues every 5 seconds is the only non-amqp traffic the broker(s) will be subject to, that'd be quite useful to know as well.
> 
> Cheers,
> 
> Tim
> 
> 
> On 13 Oct 2012, at 00:43, Matt Pietrek wrote:
> 
> > We've hit an odd circumstance in production that we can't make heads or tails of. Our setup is two RabbitMQ 2.8.6 nodes running clustered with all HA queues.
> >
> > In front of them is a VIP managed by a keepalive instances running on the same host as the RabbitMQ nodes. Every 5 seconds the keepalive instance runs a custom script that queries the node local broker via the HTTP API,  requesting the set of queues (/api/queues).
> >
> > From our logs, I can see that there was a single interval where the HTTP request returned a 404 error. (The time was the same on both brokers) Previous to this moment in time, all HTTP queries were successful, and after this moment in time, all queries were sucessful - As I said, just one blip on each broker.
> >
> > The only interesting thing we noticed in the logs was that the primary broker's rabbit at xxx-sasl file had this little snippet about 30 seconds after the hiccup occurred:
> >
> > =SUPERVISOR REPORT==== 12-Oct-2012::15:06:47 ===
> > Supervisor: {<0.25499.41>, rabbit_channel_sup_sup}
> > Context: shutdown_error
> > Reason: shutdown
> > Offender: [{pid,<0.25503.41>},
> > {name,channel_sup},
> > {mfa,{rabbit_channel_sup,start_link,[]}},
> > {restart_type,temporary},
> > {shutdown,infinity},
> > {child_type,supervisor}]
> >
> > I'm mostly at a loss to process what the snippet is telling me, nor can tell if the two things are related. Any help is appreciated!
> >
> > Thanks,
> >
> > Matt
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



More information about the rabbitmq-discuss mailing list