[rabbitmq-discuss] Clustering not working for some connections
Ivan Sanchez
s4nchez at gmail.com
Thu Oct 21 15:55:05 BST 2010
Hi all,
We are trying to run a cluster of 2 rabbitmq machines on Amazon EC2
and although it runs fine for a little while, at some stage it stops
working only for messages where producer and consumer are connected to
different nodes. At this point, "rabbitmqctl list_connections" becomes
completely unresponsive, as well as trying to restart the servers. The
only option is kill -9 all erlang process and start them again.
rabbitmqctl status shows:
Status of node rabbit at rabbit1 ...
[{running_applications,
[{rabbit_management,"RabbitMQ Management Console","2.1.1"},
{webmachine,"webmachine","1.7.0"},
{amqp_client,"RabbitMQ AMQP Client","2.1.1"},
{rabbit,"RabbitMQ","2.1.0"},
{os_mon,"CPO CXC 138 46","2.2.5"},
{sasl,"SASL CXC 138 11","2.1.9"},
{rabbit_mochiweb,"RabbitMQ Mochiweb Embedding","2.1.1"},
{mochiweb,"MochiMedia Web Server","1.3"},
{crypto,"CRYPTO version 1","1.6.4"},
{inets,"INETS CXC 138 49","5.3"},
{mnesia,"MNESIA CXC 138 12","4.4.13"},
{stdlib,"ERTS CXC 138 10","1.16.5"},
{kernel,"ERTS CXC 138 10","2.13.5"}]},
{nodes,[{disc,[rabbit at rabbit1,rabbit at rabbit2]}]},
{running_nodes,[rabbit at rabbit2,rabbit at rabbit1]}]
...done.
Status of node rabbit at rabbit2 ...
[{running_applications,
[{rabbit_management,"RabbitMQ Management Console","2.1.1"},
{webmachine,"webmachine","1.7.0"},
{amqp_client,"RabbitMQ AMQP Client","2.1.1"},
{rabbit,"RabbitMQ","2.1.0"},
{os_mon,"CPO CXC 138 46","2.2.5"},
{sasl,"SASL CXC 138 11","2.1.9"},
{rabbit_mochiweb,"RabbitMQ Mochiweb Embedding","2.1.1"},
{mochiweb,"MochiMedia Web Server","1.3"},
{crypto,"CRYPTO version 1","1.6.4"},
{inets,"INETS CXC 138 49","5.3"},
{mnesia,"MNESIA CXC 138 12","4.4.13"},
{stdlib,"ERTS CXC 138 10","1.16.5"},
{kernel,"ERTS CXC 138 10","2.13.5"}]},
{nodes,[{disc,[rabbit at rabbit1,rabbit at rabbit2]}]},
{running_nodes,[rabbit at rabbit1,rabbit at rabbit2]}]
...done.
On the logs of rabbit2, the only error I see some of these:
=ERROR REPORT==== 21-Oct-2010::14:40:47 ===
exception on TCP connection <0.19069.0> from 88.211.55.18:13580
{bad_header,<<"<policy-">>}
Other information:
- The hostnames (rabbit1, rabbit2) are defined in /etc/hosts on both
machines using their private IP, and consumers access them through a
DNS round-robin to their public IP
- Both machines use NODENAME=rabbit@<host> on /etc/rabbitmq/
rabbitmq.conf
- Cluster is defined in /etc/rabbitmq/rabbitmq.config using
{cluster_nodes, ['rabbit at rabbit1','rabbit at rabbit2']}
- We are using RabbitMQ 2.1.0 and Erlang R13B04 (erts-5.7.5)
[source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-
poll:false]
Any ideas of what can be wrong?
--
Ivan Sanchez
More information about the rabbitmq-discuss
mailing list