[rabbitmq-discuss] RabbitMQ 2.0 hanging
Dave Greggory
davegreggory at yahoo.com
Wed Sep 8 16:36:30 BST 2010
We installed RMQ 2.0 yesterday in our QA environment and noticed that it had
hung this morning.
Env Setup:
- RabbitMQ 2.0 on Erlang R13B04 on Centos VMWare VM
- 2 nodes clustered together with both nodes as disk nodes
- Load balancer in front of both nodes round robin directing connections to each
- Status plugin
- Message producers/consumers: Tomcat webapps using Spring-AMQP 1.0 M1 and
RabbitMQ client 1.8.1
- Very low message volume as this is a dev/QA environment, practically none
8-Sep-2010::09:30~ - We couldn't start our Tomcat webapps on our local dev
machines this morning because they hung when attempting to connect to RabbitMQ
8-Sep-2010::09:40~ - Could not load Status Plugin webpage
8-Sep-2010::09:40~ - rabbitmqctl status on node 1 indicated everything was ok
8-Sep-2010::09:40~ - rabbitmqctl list_queues hung on node 1
8-Sep-2010::09:45:38 - rabbitmqctl stop_app and start_app on node 1 didn't solve
the problem
8-Sep-2010::09:53:03 - rabbitmqctl stop and rabbitmq-server -detached on node 1
fixed the problem
No commands were run on node 2 - because the person troubleshooting didn't have
access to that machine :)
Something similar had happened before on RabbitMQ 1.8.1 as well. It happens like
once every 2 weeks in our QA environment (sometimes several times a day but then
it goes fine for 2 weeks), never happened on production environment. We have
both Status and BQL plugins installed on RMQ 1.8.1 in production, but only
Status plugin on RMQ 2.0 that we're testing in QA. We can try disabling plugins
but I don't think that's the right way to troubleshoot this because the problem
happens very rarely, it might lead us to believe the problem was in a plugin
when it actually was not.
Attached logs from both nodes.
=INFO REPORT==== 8-Sep-2010::09:30:52 ===
accepted TCP connection on from LOADBALANCER_IP:38339
=INFO REPORT==== 8-Sep-2010::09:30:52 ===
starting TCP connection <0.12371.17> from LOADBALANCER_IP:38339
=WARNING REPORT==== 8-Sep-2010::09:30:52 ===
exception on TCP connection <0.12371.17> from LOADBALANCER_IP:38339
=INFO REPORT==== 8-Sep-2010::09:30:52 ===
closing TCP connection <0.12371.17> from LOADBALANCER_IP:38339
There's plenty of these in the logs, that's just our load balancer checking
periodically (once every minute) to see whether RabbitMQ is alive by opening and
closing a TCP connection. I've been told this is harmless ->
But there are more interesting/suspicious entries in the both nodes logs around
09:45:38 in node1 and 09:41:38/09:47:08 on node2.
I hope you can help me figure out the root cause of the problem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbitmq-node2.log
Type: application/octet-stream
Size: 29832 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20100908/5cc34088/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbitmq-node1.log
Type: application/octet-stream
Size: 24374 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20100908/5cc34088/attachment-0003.obj>
More information about the rabbitmq-discuss
mailing list