[rabbitmq-discuss] *Advisory* Clustering not working for some connections
Michael Bridgen
mikeb at rabbitmq.com
Mon Oct 25 14:04:30 BST 2010
All,
We are pretty sure we have traced this to the inter-node routing code
used in clustering.
It's quite hard to reproduce, and in general seems uncommon; however,
/if/ you are using clustering /and/ you see these symptoms:
- publishing across nodes stops working
- rabbitmqctl list_queues or list_connections don't respond
- RabbitMQ has to have a hard restart
it is likely to be this problem, and we advise trying to adapt your
set-up to work without clustering for the time being.
We are working on fixing it in the next release.
Michael
--
Michael Bridgen
Staff Engineer, RabbitMQ
> I just noticed that on my tests the producers are also getting
> blocked (we are using a java client and basicPublish()).
>
> Trying to list_consumers also becomes unresponsive at this point.
>
> Any help would be really appreciated.
>
> Thank you,
>
> --
> Ivan Sanchez
>
> On Oct 21, 3:55 pm, Ivan Sanchez<s4nc... at gmail.com> wrote:
>> Hi all,
>>
>> We are trying to run a cluster of 2 rabbitmq machines on Amazon EC2
>> and although it runs fine for a little while, at some stage it stops
>> working only for messages where producer and consumer are connected to
>> different nodes. At this point, "rabbitmqctl list_connections" becomes
>> completely unresponsive, as well as trying to restart the servers. The
>> only option is kill -9 all erlang process and start them again.
>>
>> rabbitmqctl status shows:
>>
>> Status of node rabbit at rabbit1 ...
>> [{running_applications,
>> [{rabbit_management,"RabbitMQ Management Console","2.1.1"},
>> {webmachine,"webmachine","1.7.0"},
>> {amqp_client,"RabbitMQ AMQP Client","2.1.1"},
>> {rabbit,"RabbitMQ","2.1.0"},
>> {os_mon,"CPO CXC 138 46","2.2.5"},
>> {sasl,"SASL CXC 138 11","2.1.9"},
>> {rabbit_mochiweb,"RabbitMQ Mochiweb Embedding","2.1.1"},
>> {mochiweb,"MochiMedia Web Server","1.3"},
>> {crypto,"CRYPTO version 1","1.6.4"},
>> {inets,"INETS CXC 138 49","5.3"},
>> {mnesia,"MNESIA CXC 138 12","4.4.13"},
>> {stdlib,"ERTS CXC 138 10","1.16.5"},
>> {kernel,"ERTS CXC 138 10","2.13.5"}]},
>> {nodes,[{disc,[rabbit at rabbit1,rabbit at rabbit2]}]},
>> {running_nodes,[rabbit at rabbit2,rabbit at rabbit1]}]
>> ...done.
>>
>> Status of node rabbit at rabbit2 ...
>> [{running_applications,
>> [{rabbit_management,"RabbitMQ Management Console","2.1.1"},
>> {webmachine,"webmachine","1.7.0"},
>> {amqp_client,"RabbitMQ AMQP Client","2.1.1"},
>> {rabbit,"RabbitMQ","2.1.0"},
>> {os_mon,"CPO CXC 138 46","2.2.5"},
>> {sasl,"SASL CXC 138 11","2.1.9"},
>> {rabbit_mochiweb,"RabbitMQ Mochiweb Embedding","2.1.1"},
>> {mochiweb,"MochiMedia Web Server","1.3"},
>> {crypto,"CRYPTO version 1","1.6.4"},
>> {inets,"INETS CXC 138 49","5.3"},
>> {mnesia,"MNESIA CXC 138 12","4.4.13"},
>> {stdlib,"ERTS CXC 138 10","1.16.5"},
>> {kernel,"ERTS CXC 138 10","2.13.5"}]},
>> {nodes,[{disc,[rabbit at rabbit1,rabbit at rabbit2]}]},
>> {running_nodes,[rabbit at rabbit1,rabbit at rabbit2]}]
>> ...done.
>>
>> On the logs of rabbit2, the only error I see some of these:
>>
>> =ERROR REPORT==== 21-Oct-2010::14:40:47 ===
>> exception on TCP connection<0.19069.0> from 88.211.55.18:13580
>> {bad_header,<<"<policy-">>}
>>
>> Other information:
>> - The hostnames (rabbit1, rabbit2) are defined in /etc/hosts on both
>> machines using their private IP, and consumers access them through a
>> DNS round-robin to their public IP
>> - Both machines use NODENAME=rabbit@<host> on /etc/rabbitmq/
>> rabbitmq.conf
>> - Cluster is defined in /etc/rabbitmq/rabbitmq.config using
>> {cluster_nodes, ['rabbit at rabbit1','rabbit at rabbit2']}
>> - We are using RabbitMQ 2.1.0 and Erlang R13B04 (erts-5.7.5)
>> [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-
>> poll:false]
>>
>> Any ideas of what can be wrong?
>>
>> --
>> Ivan Sanchez
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
More information about the rabbitmq-discuss
mailing list