[rabbitmq-discuss] Feature Req / Bug list

Graeme N graeme at sudo.ca
Tue Oct 29 00:06:09 GMT 2013


More fun with RabbitMQ clustering!

These next couple I wouldn't believe if I couldn't consistently reproduce
them. Attached is a new script package which includes some updates to
previous scripts, as well as the go based queue populator.

First, is an issue where if you apply a global policy, then populate a
bunch of queues, after the queues are done populating rabbitmq removes
about 3/4s of them. It is baffling, I've attached screenshots, since I
can't really believe it myself. To reproduce:

- ./create_cluster.sh && ./setup_queues.sh &&
RABBITMQ_NODENAME="rabbit1 at localhost" rabbitmqctl set_policy --priority 0
global_pol ".*" '{"ha-mode": "exactly", "ha-params": 3, "ha-sync-mode":
"automatic"}' && sleep 5 && ./populate_queues.sh
- watch cluster admin queues page (http://localhost:4441/#/queues), see all
the queues fill up with 10000 messages, and then see 2/3rds of them
disappear once most of the queues are full.
- This happens both in my test VM, and on a bare metal server with 64GB of
RAM and 24 cores.

Next is an issue where removing and re-adding nodes breaks rabbitmq
clustering. We keep running into this in prod where we'll attempt to adjust
our cluster topology, things will break, and we'll have to take the whole
cluster down and bring it back up again to fix it.

- ./create_cluster.sh && ./setup_queues.sh && ./populate_queues.sh &&
RABBITMQ_NODENAME="rabbit1 at localhost" rabbitmqctl set_policy --priority 0
global_pol ".*" '{"ha-mode": "exactly", "ha-params": 3, "ha-sync-mode":
"automatic"}'
- watch cluster admin pages, wait until all messages are populated and
queues are synced, noting that since we applied the policy after populating
the queues, for some reason this doesn't cause the queues to be removed
like in the previous case.
- ./toggle_nodes.sh
- watch nodes be removed and re-added, should only take ~5 of these full
cycles before the script loop hangs, and doesn't return from the cluster op
it's attempting to perform. If you ctl-C the script and run it again, it
should just hang and refuse to perform any more cluster join/leave
operations on any node.
- it's also likely you'll see queues where one of the mirrors isn't
correctly synced, or possibly is partially synced but stuck and not
finishing syncing, likely related to previous policy bugs I reported.
- queues in these states often don't accept new messages for delivery,
stalling message processing.

We've found that once the cluster's in this state, it behaves really oddly
and needs to be fully shut down (or "killall beam.smp") and then brought
back up before it behaves normally. We had an incident after adding a
single node last friday where ~4 queues stopped accepting new messages and
held up our entire workload until the entire cluster was shut down and
brought back up.

These errors are all really strange, and so I'm hoping you guys can
reproduce them, and best case scenario, find something that accounts for
these problems which we can then patch in our production environment.

Thank!
Graeme
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131028/1e360077/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-rmq-before-populating.png
Type: image/png
Size: 295282 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131028/1e360077/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2-rmq-some-missing.png
Type: image/png
Size: 302599 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131028/1e360077/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3-rmq-more-missing.png
Type: image/png
Size: 233218 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131028/1e360077/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4-rmq-most-missing.png
Type: image/png
Size: 150011 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131028/1e360077/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: break-rabbitmq-v2.tar.xz
Type: application/x-xz
Size: 951556 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131028/1e360077/attachment-0001.bin>


More information about the rabbitmq-discuss mailing list